Ancestral sequence reconstruction

[3] Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes.

This approach gives access to protein properties that may have transiently arisen over evolutionary time and has recently been used as a way to infer the potential selection pressures that resulted in present-day sequences.

[9] The nascent field of 'evolutionary biochemistry' has been bolstered by the recent increase in ASR studies using the ancestors as ways to probe organismal fitness within certain cellular contexts – effectively testing ancestral proteins in vivo.

[8] Due to inherent limitations in these sorts of studies – primarily being the lack of suitably ancient genomes to fit these ancestors in to, the small repertoire of well categorized laboratory model systems, and the inability to mimic ancient cellular environments; very few ASR studies in vivo have been conducted.

Despite the above mentioned obstacles, preliminary insights into this avenue of research from a 2015 paper, have revealed that observed 'ancestral superiority' in vitro were not recapitulated in vivo of a given protein.

MP is often considered the least reliable method for reconstruction as it arguably oversimplifies evolution to a degree that is not applicable on the billion year scale.

[11] The expression of consensus sequences and parallel ASR via non-ML methods are often required to disband this theory per experiment.

Several studies have attempted to construct ancient scoring matrices via various methodologies and have compared the resultant sequences and their protein's biophysical properties.

While this method does not offer a robust statistical, mathematical measure of reliability it does build off of the fundamental idea used in ASR that individual amino acid substitutions do not cause significant biophysical property changes in a protein – a tenant that must be held true in order to be able to overcome the effect of inference ambiguity.

[1][6][14] There are many examples of ancestral proteins that have been computationally reconstructed, expressed in living cell lines, and – in many cases – purified and biochemically studied.

To this end, ASR 'age' should really be only used as an indicative feature and is often surpassed altogether for a measurement of the number of substitutions between the ancestral and the modern sequences (the fundament on which the clock is calculated).

[9] That being said, the use of a clock allows one to compare observed biophysical data of an ASR protein to the geological or ecological environment at the time.

For example, ASR studies on bacterial EF-Tus (proteins involved in translation, that are likely rarely subject to HGT and typically exhibit Tms ~2C greater than Tenv) indicate a hotter Precambrian Earth which fits very closely with geological data on ancient earth ocean temperatures based on Oxygen-18 isotopic levels.

[26] These different experiments on receptors show that, during their evolution, proteins are greatly differentiated and this explains how complexity may evolve.

[11] ASR also promises to 'resurrect' phenotypically similar 'ancient organisms' which in turn would allow evolutionary biochemists to probe the story of life.

An illustration of a phylogenetic tree and how it plays in conceptualising how ASR is conducted.
Algorithm to reconstruct ancestral sequences 1,2, and 3 (referring to figure above). The ancestral sequence of sequence 1 can be reconstructed from B and C, as long as at least one outgroup is available, e.g. D or E. For example, sequences B and C are different in position 4, but since sequences D and E have a C in that position, sequence 1 most likely had a C as well. Sequence 3 cannot be completely reconstructed without an additional outgroup sequence (uncertainty indicated by an "X").