photo of woman wearing turtleneck top

What’s Next


Almost as soon as the first genome was sequenced at the start of the last decade, the race was on to bring whole genome sequencing to the consumer level. As we look back over this time, we see the remnants of this contest: 454, Helicos and others, until now Illumina seems to be the winner of this round with a few other stragglers left behind, as we approach the long sought-after goal of the $1,000 genome. As we reach this point, genome sequencing will undergo an important transition from a headline-grabbing research methodology to a practical tool for clinical diagnosis. Clinical applications will bring a new influx of funding, as it is applied on a wide scale and as healthcare providers inflate the price of the $1,000 genome to $10,000 and more (source: my street smarts). Clinical applications will impose unique market demands on this technology, and it makes sense for us now to ask what these new demands will mean for the NGS industry.

Size Matters
Until now, ‘Short Read” sequencing has been synonymous with NGS, reflecting the fact that most of the early NGS technologies yielded high base output by producing large numbers of short reads, ranging from 10s to 100s of bases, in contrast to hitherto standard Sanger sequencing, which was capable of reading sequences of more than 1,000 bases. As demonstrated by the progress of research results over the past decade, a wide range of information can be gleaned from such short reads, but they do have their limitations. As de novo genome assembly becomes a more common application in clinical settings, there is a greater need for sequencing techniques that can get past these short read lengths to sequence through highly repetitive regions of the genome, which are often just ignored in current sequencing approaches as “unmappable.” While the read length of 454 eventually grew to over 2,000 bases (even if its total base output remained too low to be usable), modern sequencing methodologies have had a hard time increasing their read lengths. Illumina’s sequencing platforms are currently limited to 150bp paired-end reads, while Ion Torrent sequencing is limited to 200bp. While the total base output of these platforms is enough to hold the public’s imagination for now, it seems only a matter of time until they are forced to come to terms with the shortcomings of these sequencing approaches.

Limits of Clonal Methods

Despite the diversity of sequencing approaches, most (including Illumina, Ion Torrent, Complete Genomics, SOLID) have in common the use of clonal amplification to increase signal strength–libraries are ligated to various sorts adapters and amplified by PCR. The resulting products are attached to flowcells and sequencing progress (polymerase or ligase-based) monitored by imaging of fluorescently-labeled base analogues. These type of approaches are inherently limited in their ability to extend beyond a few hundred bases, since errors are cumulative, causing the error rate to increase exponentially with read length. After a few hundred bases, it’s no longer possible to keep reads in phase within a given colony at a suitably high quality level.

Size Enhancement
Recognizing these limitations, NGS developers have sought to augment read length through enhancement of their library processing protocols. Illumina recently acquired Moleculo, which had developed a method to extend read length through the interspersal of adapter sequences at intermediate points within the libraries and they have claimed reads of over 10kb through this process. Complete Genomics has taken an alternative approach, LFR, which segregates genomic DNA into smaller pools that are independently sequenced. Similar in principle to older BAC-based sequencing of genomes, this technique is also capable of producing large continuous sequences.

Remember Single Molecule?
In principle, single molecule sequencing strategies (of which only Pacific Biosciences remains) offer a simple alternative; since each read is obtained from one DNA molecule, there is no way for the read to go out of phase. PacBio’s read length is now upwards of 20,000 bases. Although PacBio’s single read quality scores have never gotten north of 75%, they can push this score up to within acceptable Q30 ranges through aggregation of multiple reads. Although its total read output has never increased above 100,000, experience has shown read count to be easier to increase than read length. Given these observations, we can’t deny that PacBio might have something to bring to the current battle of sequencing platforms.

The Biggest Stick
Based on these observations, it seems that we may be in for a new kind of race between sequencing platforms. Although Illumina is at the head out of the gate, it’s too early to discount dark horses like PacBio. As applications shift from research to the clinic, needs will also change and time will reveal the next winner.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *