Duplex sequencing

This method uses degenerate molecular tags in addition to sequencing adapters to recognize reads originating from each strand of DNA.

[3][4][5] Several library preparation strategies have been developed that increase accuracy of NGS platforms such as molecular barcoding and circular consensus sequencing method.

This process results in double-stranded library fragments that contain two random tags (α and β) on each side that are the reverse complement of each other (Figure 1 and 2).

If the family size is too small then the DCS can not be assembled and if too many reads are sharing the same tag, the data yield will be low.

Family size is determined by the amount of DNA template needed for PCR amplification and the dedicated sequencing lane fraction.

To obtain the optimal family size, the amounts of DNA template and the dedicated sequencing lane fraction need to be adjusted.

The reads are then trimmed by removing the fixed 5-base pair sequence and 4 error-prone nucleotides located at the sites of ligation and end repair.

The aligned reads that have the same 24-base pair tag sequence and genomic region are detected and grouped (family αβ and βα in Figure 2).

To remove errors that arise during PCR amplification or sequencing, mutations that are supported by less than 70% of the members (reads) are filtered out from the analysis.

It increases the NGS accuracy to about 20 fold higher; however, this method relies on the sequencing information from single strands of DNA and therefore is sensitive to the errors induced at the first round or before PCR amplification.

[1][2] The high error rate (0.01-0.001) of standard NGS platforms introduced during sample preparation or sequencing is a major limitation for the detection of variants present in a small fraction of cells.

[1][2][10] It is challenging to identify rare variants accurately using standard NGS methods with a mutation rate of (10−2 to 10−3).

An example of such errors is C>A/G>T transversion, detected in low frequencies using deep sequencing or targeted capture data and arising due to DNA oxidation during sample preparation.

Duplex sequencing can theoretically detect mutations with frequencies as low as 10−8 compared to the 10−2 rate of standard NGS methods.

[1][2][10] Another advantage of duplex sequencing is that it can be used in combination with the majority of NGS platforms without making significant changes to the standard protocols.

However, the application of duplex sequencing for larger DNA targets will be more feasible when the cost of NGS decreases.

Duplex sequencing overview: Duplex tagged libraries containing sequencing adapters are amplified and result in two types of products each originates from a single strand of DNA. After sequencing the PCR products, the generated reads divide into tag families based on the genomic position, duplex tags, and the neighboring sequencing adapter. Sequence tag α is the reverse complement of sequence tag β and vice versa.
Duplex sequencing library preparation workflow: Two adapter oligos go through several steps (Annealing, Synthesis, dT-tailing) to generate double-stranded unique tags with 3'-dT-overhangs. Then the duplex tag adapters ligate to the double-stranded DNA templates. Finally, Illumina sequencing adapters are inserted into the tagged-DNA fragments and form the final libraries containing DS adapters, Illumina sequencing adapters, and template DNA.