Open reading frame

In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons.

In the context of gene finding, the start-stop definition of an ORF therefore only applies to spliced mRNAs, not genomic DNA, since introns may contain stop codons and/or cause shifts between reading frames.

[1][4] This more general definition can be useful in the context of transcriptomics and metagenomics, where a start or stop codon may not be present in the obtained sequences.

However, less than 10% of the vertebrate mRNAs surveyed in an older study contained AUG codons in front of the major ORF.

[12] 64–75% of experimentally found translation initiation sites of sORFs are conserved in the genomes of human and mouse and may indicate that these elements have function.

[13] However, sORFs can often be found only in the minor forms of mRNAs and avoid selection; the high conservation of initiation sites may be connected with their location inside promoters of the relevant genes.

The pairwise global alignment between the sequences makes it convenient to detect the different mutations, including single nucleotide polymorphism.

The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends.

ORFik is a R-package in Bioconductor for finding open reading frames and using Next generation sequencing technologies for justification of ORFs.

orfipy is particularly faster for data containing multiple smaller FASTA sequences, such as de-novo transcriptome assemblies.

Sample sequence showing three different possible reading frames . Start codons are highlighted in purple, and stop codons are highlighted in red.
Example of a six-frame translation. The nucleotide sequence is shown in the middle with forward translations above and reverse translations below. Two possible open reading frames with the sequences are highlighted.