A typical sequencing experiment involves fragmentation of the genome into millions of molecules, which are size-selected and ligated to adapters.
[5] For example, longer read lengths improve the resolution of de novo genome assembly and detection of structural variants.
It is estimated that read lengths greater than 100 kilobases (kb) will be required for routine de novo human genome assembly.
[6] Bioinformatic pipelines to analyze sequencing data usually take into account read lengths.
Single or double stranded nucleic acids store this information in a linear or in a circular sequence.
TGS is a term used to describe methods that are capable of sequencing single DNA molecules without amplification.
The longest read length ever generated by a third-generation sequencing technology is 2 million base pairs.
These reference genomes can be used to guide resequencing efforts in the same species by serving as a read mapping template.
Although this method is cost-effective, the reads are short and the repeat sections are long, resulting in fragmented genomes.
Long reads are capable of resolving the ordering of repeat regions, although they have a high error rate (15–18%).
To correct errors in third-generation sequencing reads, a number of computational methods have been devised.
[17] Another challenge with SRS is the detection of large sequence changes, which is a major roadblock to studying structural variations.
The availability of long reads constitutes a great advantage, because it is often difficult to generate long continuous consensus sequence using NGS because of the difficulty of detecting overlaps between NGS short reads, thus impacting the overall quality of assembly.
[19][20] Another advantage of LRS over NGS is that it provides the simultaneous capability of characterizing a variety of epigenetic marks along with DNA sequencing.