Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing.
Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence.
Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome, as of 2004.
The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991.
In 1995 Roach et al.[8] introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets.
To apply the strategy, a high-molecular-weight DNA strand is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector.
Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.
The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation.
If the gap is small (5-20kb) then the use of polymerase chain reaction (PCR) to amplify the region is required, followed by sequencing.
As sequence assembly programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitation.
For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy.
A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly.
[14] It was not widely accepted that a full-genome shotgun sequence of a large genome would provide reliable data.
The amplified genome is first sheared into larger pieces (50-200kb) and cloned into a bacterial host using BACs or P1-derived artificial chromosomes (PAC).
A small radioactively or chemically labeled probe containing a sequence-tagged site (STS) can be hybridized onto a microarray upon which the clones are printed.
The end of one of these clones can then be sequenced to yield a new probe and the process repeated in a method called chromosome walking.
The process of extensive BAC library creation and tiling path selection, however, make hierarchical shotgun sequencing slow and labor-intensive.
With millions of reads from next generation sequencing of an environmental sample, it is possible to get a complete overview of any complex microbiome with thousands of species, like the gut flora.