AlphaFold

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure.

On 15 July 2021, the AlphaFold 2 paper was published in Nature as an advance access publication alongside open source software and a searchable database of species proteomes.

"[16] Hassabis and Jumper had previously won the Breakthrough Prize in Life Sciences and the Albert Lasker Award for Basic Medical Research in 2023 for their leadership of the AlphaFold project.

Protein structures can be determined experimentally through techniques such as X-ray crystallography, cryo-electron microscopy and nuclear magnetic resonance, which are all expensive and time-consuming.

The program uses a form of attention network, a deep learning technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution.

AlphaFold 2 replaced this with a system of interconnected sub-networks, forming a single, differentiable, end-to-end model based on pattern recognition.

[24][25] After the neural network's prediction converges, a final refinement step applies local physical constraints using energy minimization based on the AMBER force field.

In an example presented by DeepMind, the structure prediction module achieved a correct topology for the target protein on its first iteration, scored as having a GDT_TS of 78, but with a large number (90%) of stereochemical violations – i.e. unphysical bond angles or lengths.

This model begins with a cloud of atoms and iteratively refines their positions, guided by the Pairformer's output, to generate a 3D representation of the molecular structure.

[33] In December 2018, DeepMind's AlphaFold placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP).

[42][19] but, as stated in the "Read Me" file on that website: "This code can't be used to predict structure of an arbitrary protein sequence.

[6] On the competition's preferred global distance test (GDT) measure of accuracy, the program achieved a median score of 92.4 (out of 100), meaning that more than half of its predictions were scored at better than 92.4% for having their atoms in more-or-less the right place,[45][46] a level of accuracy reported to be comparable to experimental techniques like X-ray crystallography.

To further validate AlphaFold 2, the conference organizers approached four leading experimental groups working on structures they found particularly challenging and had been unable to determine.

In all four cases the three-dimensional models produced by AlphaFold 2 were sufficiently accurate to determine structures of these proteins by molecular replacement.

The third exists in nature as a multidomain complex consisting of 52 identical copies of the same domain, a situation AlphaFold was not programmed to consider.

For all targets with a single domain, excluding only one very large protein and the two structures determined by NMR, AlphaFold 2 achieved a GDT_TS score of over 80.

[7] Nobel Prize winner and structural biologist Venki Ramakrishnan called the result "a stunning advance on the protein folding problem",[5] adding that "It has occurred decades before many people in the field would have predicted.

[52][53][54][55] A frequent theme was that ability to predict protein structures accurately based on the constituent amino acid sequence is expected to have a wide variety of benefits in the life sciences space including accelerating advanced drug discovery and enabling better understanding of diseases.

[57] In 2023, Demis Hassabis and John Jumper won the Breakthrough Prize in Life Sciences[18] as well as the Albert Lasker Award for Basic Medical Research for their management of the AlphaFold project.

[58] Hassabis and Jumper proceeded to win the Nobel Prize in Chemistry in 2024 for their work on “protein structure prediction” with David Baker of the University of Washington.

[78][7] Results were reviewed by scientists at the Francis Crick Institute in the United Kingdom before being released to the broader research community.

[79] The team acknowledged that although these protein structures might not be the subject of ongoing therapeutical research efforts, they will add to the community's understanding of the SARS-CoV-2 virus.

three individual polypeptide chains at different levels of folding and a cluster of chains
Amino-acid chains, known as polypeptides , fold to form a protein.
AlphaFold 2 performance, experiments, and architecture [ 22 ]
Architectural details of AlphaFold 2 [ 22 ]
Results achieved for protein prediction by the best reconstructions in the CASP 2018 competition (small circles) and CASP 2020 competition (large circles), compared with results achieved in previous years.
The crimson trend-line shows how a handful of models including AlphaFold 1 achieved a significant step-change in 2018 over the rate of progress that had previously been achieved, particularly in respect of the protein sequences considered the most difficult to predict.
(Qualitative improvement had been made in earlier years, but it is only as changes bring structures within 8 Å of their experimental positions that they start to affect the CASP GDS-TS measure).
The orange trend-line shows that by 2020 online prediction servers had been able to learn from and match this performance, while the best other groups (green curve) had on average been able to make some improvements on it. However, the black trend curve shows the degree to which AlphaFold 2 had surpassed this again in 2020, across the board.
The detailed spread of data points indicates the degree of consistency or variation achieved by AlphaFold. Outliers represent the handful of sequences for which it did not make such a successful prediction.