WritinGenomics

Why scales beat sequencers for giant genomes

How do we estimate the size of the largest genomes on Earth? I’m thinking of genomes over 100 billion base pairs (bp), genomes so massive that, if unwound, would stand taller than the Big Ben!

You may think that DNA sequencing is the obvious answer. After all, what better way to gauge the size of a new genome than to read it from start to finish? In reality, it’s not that simple. When it comes to giant genomes, scientists drop the sequencer (gently!) and use the “scale” instead. 

In this post, we’ll see why sequencing fails and how you weigh a genome.

GENOMIC GIANT GENOME SIZE
Hanging fork fern 160.5 billion bp
Japanese canopy plant 148.9 billion bp
Long fork fern147.3 billion bp
Marbled lungfish 130.1 billion bp
Japanese hybrid wakerobin 129.5 billion bp
South American lungfish118.0 billion bp
Dwarg waterdog116.6 billion bp
Kamchackta wakerobin 109.0 billion bp
Common mistletoe100.6 billion bp

Giant genomes, as discussed last year in my blog

WHY SEQUENCING GIANTS IS HARD

Most genomes contain large repetitive regions, stretches of DNA that appear again and again, even thousands of times. And when it comes to sequencing, repetitive DNA is a nightmare. Why? 

That depends on how sequencing work. Before reading a genome, it must be broken into millions of tiny fragments. Once these fragments are read, computers must find the overlap so they can be stiched back together. When you sequence a genome for the first time, you simply can’t tell how many times a repeated sequence actually occurs. You may know that a chunk of a chromosome is made up of many copies of the sequence TTCCCGGCCC, but you won’t know if it appears a thousand or ten thousand times. 

Let’s pretend a genome is a jigsaw puzzle of a sunny day in an alpine village and it’s divided into almost infinite pieces (the sequencing reads). Your task is to piece together the puzzle (to assemble the genome). If you don’t have the picture on the box to guide you (the genome reference), you’ll never be sure how many identical pieces (the TTCCCGGCCC sequence) make up the blue sky (the repetitive region)! 


As the size of these repetitive regions is shrouded in uncertainty, the total size of the whole genome remains an open question. The more abundant the repeats and the higher the uncertainty. This is especially true for giant genomes, which are “choking” on billions and billions of base pairs of nothing but repeats.

That’s why weighing is often the solution to gauge the size of a giant genome. This is how it’s done.

HOW TO WEIGH A GIANT GENOME

Instead of using a sequencer, researchers put the nuclei containing the genomes on a special scale called a flow cytometer.

A flow cytometer is a system of microtubes that lines up one nucleus (or cell) at a time in front of a laser. This laser is used to pry information out of the objects it hits.

In brief, this is how researchers weigh a genome of unknown size:

  1. Tissues and cells are broken to release nuclei,
  2. nuclei are stained with a dye specific for DNA. The chemical wedges itself between the bases of the double helix in a fixed ratio.
  3. This mixture is fed into the flow cytometer.
  4. As the laser hits each nucleus, the dye absorbs the energy (it becomes excited, in jargon) and emits fluorescence back.
  5. A dedicated detector measures this fluorescence.

As the molecules of dye are proportional to the amount of DNA in the nucleus, the “heavier” the genome, the higher the fluorescence in each nucleus. By comparing the fluorescence emitted by the unknown genome to the fluorescence emitted by a standard (a genome whose size is known and comparable to the sample), researchers can estimate its weight.

The result is expressed in picograms (trillionths of a gram). Finally, the weight is converted to base pairs: 1 picogram is equal to 0.978 billion bp (Gregory & Hebert, 1999).

DIFFERENT TOOLS FOR DIFFERENT QUESTIONS

Don’t get me wrong: sequencing technologies are becoming increasingly capable of handling giant genomes. This advance is merit of long-read sequencing, technologies (such as PacBio and Oxford Nanopore) that read hundreds of thousands or even a million bp at once rather than in hundreds (or thousands) of short fragments. With long reads, sequencers cover a swath of repetitive sequences in a single breath, clearing much of our uncertainty.

On the other hand, flow cytometry has its own limitations. Its readout can be skewed by how samples are prepared, by the dye and standard used, or by the biology of the sample itself. However, when it comes to just weighing a genome — especially a massive one — flow cytometry is still hard to beat: quick, relatively cheap, easier to perform, and reliable when used with care.

As always, different questions call for different tools. If you want to understand why genomes can balloon to Big‑Ben size, grab a sequencer. But if you want to find those giant genomes in the first place, reach for the flow cytometer.

Leave a comment