Reference genome

Instead, a reference provides a haploid mosaic of different DNA sequences from each donor.

[1] There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals.

A simple way to measure genome length is to count the number of base pairs in the assembly.

[6] Reference genomes assembly requires reads overlapping, creating contigs, which are contiguous DNA regions of consensus sequences.

GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence.

The original human reference genome was derived from thirteen anonymous volunteers from Buffalo, New York.

In several cases people such as James D. Watson had their genome assembled using massive parallel DNA sequencing.

[21][22] For regions where there is known to be large-scale variation, sets of alternate loci are assembled alongside the reference locus.

[1] According to the GRC website, their next assembly release for the human genome (version GRCh39) is currently "indefinitely postponed".

The consortium employed rigorous methods to assemble, clean, and validate complex repeat regions which are particularly difficult to sequence.

The HapMap Project, active during the period 2002 -2010, with the purpose of creating a haplotypes map and their most common variations among different human populations.

[41][42][43][44] The 1000 Genomes Project, carried out between 2008 and 2015, with the aim of creating a database that includes more than 95% of the variations present in the human genome and whose results can be used in studies of association with diseases (GWAS) such as diabetes, cardiovascular or autoimmune diseases.

As of August 2022, the NCBI database supports 71 886 partially or completely sequenced and assembled genomes from different species, such as 676 mammals, 590 birds and 865 fishes.

Also noteworthy are the numbers of 1796 insects genomes, 3747 fungi, 1025 plants, 33 724 bacteria, 26 004 virus and 2040 archaea.