De novo transcriptome assembly

[2] Transcriptomes have subsequently been created for chickpea,[3] planarians,[4] Parhyale hawaiensis,[5] as well as the brains of the Nile crocodile, the corn snake, the bearded dragon, and the red-eared slider, to name just a few.

[6] Examining non-model organisms can provide novel insights into the mechanisms underlying the "diversity of fascinating morphological innovations" that have enabled the abundance of life on planet Earth.

[7] In animals and plants, the "innovations" that cannot be examined in common model organisms include mimicry, mutualism, parasitism, and asexual reproduction.

[9] Once RNA is extracted and purified from cells, it is sent to a high-throughput sequencing facility, where it is first reverse transcribed to create a cDNA library.

This algorithm is more computationally intensive than de Bruijn graphs, and most effective in assembling fewer reads with a high degree of overlap.

The k-mers are shorter than the read lengths allowing fast hashing so the operations in de Bruijn graphs are generally less computationally intensive.

[12] Following annotation, KEGG (Kyoto Encyclopedia of Genes and Genomes) enables visualization of metabolic pathways and molecular interaction networks captured in the transcriptome.

[13] In addition to being annotated for GO terms, contigs can also be screened for open reading frames (ORFs) in order to predict the amino acid sequence of proteins derived from these transcripts.

Short sequences (< 40 amino acids) are unlikely to represent functional proteins, as they are unable to fold independently and form hydrophobic cores.

In simulations, Velvet can produce contigs up to 50-kb N50 length using prokaryotic data and 3-kb N50 in mammalian bacterial artificial chromosomes (BACs).

Trans-ABySS (Assembly By Short Sequences) is a software pipeline written in Python and Perl for analyzing ABySS-assembled transcriptome contigs.

The Trans-ABySS algorithms are also able to estimate gene expression levels, identify potential polyadenylation sites, as well as candidate gene-fusion events.

[22] Trinity[23] first divides the sequence data into a number of de Bruijn graphs, each representing transcriptional variations at a single gene or locus.