[1] At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.
[2] De novo methods, a term first coined by William DeGrado[3], tend to require vast computational resources, and have thus only been carried out for relatively small proteins.
Prediction of protein structure de novo for larger proteins will require better algorithms and larger computational resources such as those afforded by either powerful supercomputers (such as Blue Gene or MDGRAPE-3) or distributed computing projects (such as Folding@home, Rosetta@home, the Human Proteome Folding Project, or Nutritious Rice for the World).
[5] In light of experimental limitations, devising efficient computer programs to close the gap between known sequence and structure is believed to be the only feasible option.
Research into de novo structure prediction has been primarily focused into three areas: alternate lower-resolution representations of proteins, accurate energy functions, and efficient sampling methods.
A general paradigm for de novo prediction involves sampling conformation space, guided by scoring functions and other sequence-dependent biases such that a large set of candidate (“decoy") structures are generated.
Second, several different human diseases, such as Duchenne muscular dystrophy, can be linked to loss of protein function resulting from a change in just a single amino acid in the primary sequence.
However, proteins are properly folded within the body on short timescales all the time, meaning that the process cannot be random and, thus, can potentially be modeled.
One of the strongest lines of evidence for the supposition that all the relevant information needed to encode protein tertiary structure is found in the primary sequence was demonstrated in the 1950s by Christian Anfinsen.
[10] By developing the QUARK program, Xu and Zhang showed that ab initio structure of some proteins can be successfully constructed through a knowledge-based force field .
However, below this threshold three other classes of strategy are used to determine possible structure from an initial model: ab initio protein prediction, fold recognition, and threading.
For example, a distributed method was utilized by a team of researchers at the University of Washington and the Howard Hughes Medical Institute to predict the tertiary structure of the protein T0283 from its amino acid sequence.
Namely, ESMFold is a newly developed large language model (LLM) for the prediction of protein structures based solely on their amino acid sequences.
In the CASP experiments, research groups are invited to apply their prediction methods to amino acid sequences for which the native structure is not known but to be determined and to be published soon.