Nucleic acid structure prediction

Many secondary structure prediction methods rely on variations of dynamic programming and therefore are unable to efficiently identify pseudoknots.

This is partly because the extra oxygen in RNA increases the propensity for hydrogen bonding in the nucleic acid backbone.

The structure prediction methods can follow a completely theoretical approach, or a hybrid one incorporating experimental data.

[1][2] A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only a nucleic acid sequence.

Secondary structure of small RNA molecules is largely determined by strong, local interactions such as hydrogen bonds and base stacking.

[7][8] One of the early attempts at predicting RNA secondary structure was made by Ruth Nussinov and co-workers who developed a dynamic programming-based algorithm that maximized the length and number of a series of "blocks" (polynucleotide chains).

Once the lowest free energy of the complete sequence is calculated, the exact structure of RNA molecule is determined.

Secondary structures that fall into this category include double helices, stem-loops, and variants of the "cloverleaf" pattern found in transfer RNA molecules.

These methods rely on pre-calculated parameters which estimate the free energy associated with certain types of base-pairing interactions, including Watson-Crick and Hoogsteen base pairs.

[11] One of the issues when predicting RNA secondary structure is that the standard free energy minimization and statistical sampling methods can not find pseudoknots.

This has prompted several researchers to implement versions of the algorithm that restrict classes of pseudoknots, resulting in performance gains.

[18] ILM (iterated loop matching) unlike the other algorithms for folding of alignments, can return pseudoknoted structures.

[21] Basically, Sankoff algorithm is a merger of sequence alignment and Nussinov [7] (maximal-pairing) folding dynamic programming method.

Some notable attempts at implementing restricted versions of Sankoff's algorithm are Foldalign,[23][24] Dynalign,[25][26] PMmulti/PMcomp,[22] Stemloc,[27] and Murlet.

Also RNA molecules often contain posttranscriptionally modified nucleosides, which because of new possible non-canonical interactions, cause a lot of troubles for tertiary structure prediction.

[34] The alternative strategy is de novo modeling of RNA secondary structure[35] which uses physics-based principles such as molecular dynamics[36] or random sampling of the conformational landscape[37] followed by screening with a statistical potential for scoring.

S. cerevisiae tRNA-PHE structure space: the energies and structures were calculated using RNAsubopt and the structure distances computed using RNAdistance.