Shotgun sequencing

Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing.

Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence.

Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome, as of 2004.

The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991.

In 1995 Roach et al.[8] introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets.

To apply the strategy, a high-molecular-weight DNA strand is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector.

Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.

The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation.

If the gap is small (5-20kb) then the use of polymerase chain reaction (PCR) to amplify the region is required, followed by sequencing.

As sequence assembly programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitation.

For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy.

A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly.

[14] It was not widely accepted that a full-genome shotgun sequence of a large genome would provide reliable data.

The amplified genome is first sheared into larger pieces (50-200kb) and cloned into a bacterial host using BACs or P1-derived artificial chromosomes (PAC).

A small radioactively or chemically labeled probe containing a sequence-tagged site (STS) can be hybridized onto a microarray upon which the clones are printed.

The end of one of these clones can then be sequenced to yield a new probe and the process repeated in a method called chromosome walking.

The process of extensive BAC library creation and tiling path selection, however, make hierarchical shotgun sequencing slow and labor-intensive.

With millions of reads from next generation sequencing of an environmental sample, it is possible to get a complete overview of any complex microbiome with thousands of species, like the gut flora.

Simplified scheme of shotgun sequencing technique: firstly, the DNA fragment is cut into small, overlapping pieces. Then each fragment is sequenced, and the complete sequence is assembled based on the similarity of the overlapping ends. This approach can be applied only to small genomes or fragments of bigger genomes due to the presence of multiple repeated sequences.
In whole genome shotgun sequencing (top), the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled. In hierarchical shotgun sequencing (bottom), the genome is first broken into larger segments. After the order of these segments is deduced, they are further sheared into fragments appropriately sized for sequencing.
A BAC contig that covers the entire genomic area of interest makes up the tiling path.