Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions.
Subsequent technological advances since the late 1990s have repeatedly transformed the field and made transcriptomics a widespread discipline in biological sciences.
Transcriptome analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease.
Libraries of silkmoth mRNA transcripts were collected and converted to complementary DNA (cDNA) for storage using reverse transcriptase in the late 1970s.
[19][20] In 1995, one of the earliest sequencing-based transcriptomic methods was developed, serial analysis of gene expression (SAGE), which worked by Sanger sequencing of concatenated random transcript fragments.
[34][35] Microarray technology allowed the assay of thousands of transcripts simultaneously and at a greatly reduced cost per gene and labour saving.
[50] Serial analysis of gene expression (SAGE) was a development of EST methodology to increase the throughput of the tags generated and allow some quantitation of transcript abundance.
[21] The cap analysis gene expression (CAGE) method is a variant of SAGE that sequences tags from the 5’ end of an mRNA transcript only.
SAGE and CAGE methods produce information on more genes than was possible when sequencing single ESTs, but sample preparation and data analysis are typically more labour-intensive.
Spotted arrays use two different fluorophores to label the test and control samples, and the ratio of fluorescence is used to calculate a relative measure of abundance.
RNA-Seq refers to the combination of a high-throughput sequencing methodology with computational methods to capture and quantify transcripts present in an RNA extract.
[9] Both low-abundance and high-abundance RNAs can be quantified in an RNA-Seq experiment (dynamic range of 5 orders of magnitude)—a key advantage over microarray transcriptomes.
RNA-Seq methodology has constantly improved, primarily through the development of DNA sequencing technologies to increase throughput, accuracy, and read length.
Methods differ in the use of transcript enrichment, fragmentation, amplification, single or paired-end sequencing, and whether to preserve strand information.
[72] UMIs provide an absolute scale for quantification, the opportunity to correct for subsequent amplification bias introduced during library construction, and accurately estimate the initial sample size.
UMIs are particularly well-suited to single-cell RNA-Seq transcriptomics, where the amount of input RNA is restricted and extended amplification of the sample is required.
[85][86] A large number of reads are needed to ensure sufficient coverage of the transcriptome, enabling detection of low abundance transcripts.
Added to those considerations is that every species has a different number of genes and therefore requires a tailored sequence yield for an effective transcriptome.
[88][89][90] Transcriptomics methods are highly parallel and require significant computation to produce meaningful data for both microarray and RNA-Seq experiments.
Multiple short probes matching a single transcript can reveal details about the intron-exon structure, requiring statistical models to determine the authenticity of the resulting signal.
Raw data is examined to ensure: quality scores for base calls are high, the GC content matches the expected distribution, short sequence motifs (k-mers) are not over-represented, and the read duplication rate is acceptably low.
Legend: RAM – random access memory; MPI – message passing interface; EST – expressed sequence tag.
The final outputs of these analyses are gene lists with associated pair-wise tests for differential expression between treatments and the probability estimates of those differences.
[10][141] RNA-Seq approaches have allowed for the large-scale identification of transcriptional start sites, uncovered alternative promoter usage, and novel splicing alterations.
[142] RNA-Seq can also identify disease-associated single nucleotide polymorphisms (SNPs), allele-specific expression, and gene fusions, which contributes to the understanding of disease causal variants.
[145][146] RNA-Seq of human pathogens has become an established method for quantifying gene expression changes, identifying novel virulence factors, predicting antibiotic resistance, and unveiling host-pathogen immune interactions.
This technique enables the study of the dynamic response and interspecies gene regulatory networks in both interaction partners from initial contact through to invasion and the final persistence of the pathogen or clearance by the host immune system.
[156] Integration of RNA-Seq datasets across different tissues has been used to improve annotation of gene functions in commercially important organisms (e.g. cucumber)[157] or threatened species (e.g.
For example, a database of SNPs used in Douglas fir breeding programs was created by de novo transcriptome analysis in the absence of a sequenced genome.
This article was adapted from the following source under a CC BY 4.0 license (2017) (reviewer reports): Rohan Lowe; Neil Shirley; Mark Bleackley; Stephen Dolan; Thomas Shafee (18 May 2017).