1000 Plant Genomes Project

The project successfully sequenced the transcriptomes (expressed genes) of 1,000 different plant species by 2014;[1][2] its final capstone products were published in 2019.

There have been efforts to determine the evolutionary relationships between the known plant species,[12][13] but phylogenies (or phylogenetic trees) created solely using morphological data, cellular structures, single enzymes, or on only a few sequences (like rRNA) can be prone to error;[14] morphological features are especially vulnerable when two species look physically similar though they are not closely related (as a result of convergent evolution for example) or homology, or when two species closely related look very different because, for example, they are able to change in response to their environment very well.

Here, knowing the sequence of the plant's genes involved in the metabolic pathway producing the oil is a large first step to allow such utilization.

A recent example of how engineering natural biochemical pathways works is Golden rice which has involved genetically modifying its pathway, so that a precursor to vitamin A is produced in large quantities making the brown-colored rice a potential solution for vitamin A deficiency.

Biosynthetic pathways could also be used for mass production of medicinal compounds using plants rather than manual organic chemical reactions as most are created currently.

[20] A number of biotech companies are developing these channelrhodopsin proteins for medical purposes, with many of these optogenetic therapy candidates under clinical trials to restore vision for retinal blindness.

The initial 3Gb/run (3 billion base pairs per experiment) capacity of each of these machines enabled fast and accurate sequencing of the plant samples.

In addition to industrial compound biosynthetic capacity, plant species known or suspected to produce medically active chemicals (such as poppies producing opiates) were assigned a high priority to better understand the synthesis process, explore commercial production potential, and discover new pharmaceutical options.

The numbers of coding genes in plant species can vary considerably, but all have tens of thousands or more making the transcriptome a large collection of information.

[26] mRNA (messenger RNA) is collected from a sample, converted to cDNA by a reverse transcriptase enzyme, and then fragmented so that it can be sequenced.

[citation needed] The type of tissue collected was determined by the expected location of biosynthetic activity; for example if an interesting process or chemical is known to exist primarily in the leaves, leaf sample was used.

A number of RNA-sequencing protocols were adapted and tested for different tissue types,[24] and these were openly shared via the protocols.io platform.

Many plant species (especially agriculturally manipulated ones) [29] are known to have undergone large genome-wide changes through duplication of the whole genome.