[6][7] Most of the chromosomal sequences are produced by the activity of mobile genetic elements (MGEs) in the plant genomes.
In plants, long- terminal repeat (LTR) retrotransposons are predominant and constitute from 15%[9] to 90% of the genome.
At an estimated size between 400 and 430 Mb, approximatively four times larger in dimensions than A. thaliana, rice has the smallest of the major cereal crop genomes.
EnsemblPlants[16] is part of EnsemblGenome database and contains resources for a reduced number of sequenced plant species (45, Oct. 2017).
It mainly provides genome sequences, gene models, functional annotations and polymorphic loci.
For some of the plant species, additional information is provided including population structure, individual genotypes, linkage, and phenotype data.
Gramene[17] is an online web database resource for plant comparative genomics and pathway analysis based on Ensembl technology.
In general, for sequencing and assembling large and complex genomes like plants, different strategies are used, based on the technologies available at that time when the project started.
From clones with restriction fragment fingerprint, by comparison of the patterns and hybridization or polymerase chain reaction (PCR) the physical maps were constructed.
Direct PCR products were used to clone remaining gaps, and YACs allowed the characterization of telomere sequences.
One of the most important crops in the world, maize (Zea mays), is the last plant genome project primarily based on Sanger BAC-by-BAC strategy.
[23] To assemble the genome of maize a set of 16,848 minimally overlapping BAC clones derived from combinations of physical and genetic map were selected and sequenced.
Sanger clone-by-clone strategy has the advantage of working in small units, which reduces the complexity and computational requirements, as well as minimized problems associated with the misassembly of highly repetitive DNA and therefore is an attractive solution in assembling plant genomes and other complex eukaryotic genomes.
The DNA is randomly sheared and cloned fragments are sequenced and assembled using computational methods.
This technology reduced the cost and the time associated with construction of the maps and relies on computational resources.
Later improvements of this strategy enabled the sequencing of Brachypodium distachyon,[30] Sorghum bicolor[31] and soybean.
The result is about 30% smaller than the genome size estimated by flow cytometry of isolated nuclei stained with propidium iodide (367 Mb).
[39] In general, long reads from TGS have relatively high error rates (≈10% on average)[40] and therefore repeated sequencing of the same DNA fragments is required.
Further BioNano[47] optical mapping analysis with a total length of 649.7 Mb, were used in the hybrid assembly pipeline together with the scaffolds obtained from the previous step.
The resulting scaffolds were anchored to a genetic map constructed from 15,417 single-nucleotide polymorphisms (SNPs) markers.