Statistical potential

The original method to obtain such potentials is the quasi-chemical approximation, due to Miyazawa and Jernigan.

[2] It was later followed by the potential of mean force (statistical PMF [Note 1]), developed by Sippl.

[3] Although the obtained scores are often considered as approximations of the free energy—thus referred to as pseudo-energies—this physical interpretation is incorrect.

[4][5] Nonetheless, they are applied with success in many cases, because they frequently correlate with actual Gibbs free energy differences.

[6] Possible features to which a pseudo-energy can be assigned include: The classic application is, however, based on pairwise amino acid contacts or distances, thus producing statistical interatomic potentials.

The energies are determined using statistics on amino acid contacts in a database of known protein structures (obtained from the PDB).

Many textbooks present the statistical PMFs as proposed by Sippl [3] as a simple consequence of the Boltzmann distribution, as applied to pairwise distances between amino acids.

Simple rearrangement results in the inverse Boltzmann formula, which expresses the free energy

: To construct a PMF, one then introduces a so-called reference state with a corresponding distribution

, and calculates the following free energy difference: The reference state typically results from a hypothetical system in which the specific interactions between the amino acids are absent.

However, the physical meaning of these statistical PMFs has been widely disputed, since their introduction.

For liquids, the potential of mean force is related to the radial distribution function

For liquids, the reference state is clearly defined; it corresponds to the ideal gas, consisting of non-interacting particles.

by: According to the reversible work theorem, the two-particle potential of mean force

is the reversible work required to bring two particles in the liquid from infinite separation to a distance

[11] Sippl justified the use of statistical PMFs—a few years after he introduced them for use in protein structure prediction—by appealing to the analogy with the reversible work theorem for liquids.

can be experimentally measured using small angle X-ray scattering; for proteins,

Moreover, this analogy does not solve the issue of how to specify a suitable reference state for proteins.

In the mid-2000s, authors started to combine multiple statistical potentials, derived from different structural features, into composite scores.

Probabilistic neural networks (PNNs) have also been applied for the training of a position-specific distance-dependent statistical potential.

[13] In 2016, the DeepMind artificial intelligence research laboratory started to apply deep learning techniques to the development of a torsion- and distance-dependent statistical potential.

[14] The resulting method, named AlphaFold, won the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) by correctly predicting the most accurate structure for 25 out of 43 free modelling domains.

Baker and co-workers [15] justified statistical PMFs from a Bayesian point of view and used these insights in the construction of the coarse grained ROSETTA energy function.

Obviously, the negative of the logarithm of the expression has the same functional form as the classic pairwise distance statistical PMFs, with the denominator playing the role of the reference state.

This explanation has two shortcomings: it relies on the unfounded assumption the likelihood can be expressed as a product of pairwise probabilities, and it is purely qualitative.

Hamelryck and co-workers [6] later gave a quantitative explanation for the statistical potentials, according to which they approximate a form of probabilistic reasoning due to Richard Jeffrey and named probability kinematics.

From this point of view, (i) it is not necessary to assume that the database of protein structures—used to build the potentials—follows a Boltzmann distribution, (ii) statistical potentials generalize readily beyond pairwise differences, and (iii) the reference ratio is determined by the prior distribution.

Expressions that resemble statistical PMFs naturally result from the application of probability theory to solve a fundamental problem that arises in protein structure prediction: how to improve an imperfect probability distribution

This explanation is quantitive, and allows the generalization of statistical PMFs from pairwise distances to arbitrary coarse grained variables.

Conventional applications of pairwise distance statistical PMFs usually lack two necessary features to make them fully rigorous: the use of a proper probability distribution over pairwise distances in proteins, and the recognition that the reference state is rigorously defined by

Example of interatomic pseudopotential , between β-carbons of isoleucine and valine residues, generated by using MyPMFs . [ 1 ]
The reference ratio method. is a probability distribution that describes the structure of proteins on a local length scale (right). Typically, is embodied in a fragment library, but other possibilities are an energy function or a graphical model . In order to obtain a complete description of protein structure, one also needs a probability distribution that describes nonlocal aspects, such as hydrogen bonding. is typically obtained from a set of solved protein structures from the PDB (left). In order to combine with in a meaningful way, one needs the reference ratio expression (bottom), which takes the signal in with respect to into account.