Ancestral reconstruction

Furthermore, Theodosius Dobzhansky and Alfred Sturtevant articulated the principles of ancestral reconstruction in a phylogenetic context in 1938, when inferring the evolutionary history of chromosomal inversions in Drosophila pseudoobscura.

Parsimony is an important exception to this paradigm: though it has been shown that there are circumstances under which it is the maximum likelihood estimator,[18] at its core, it is simply based on the heuristic that changes in character state are rare, without attempting to quantify that rarity.

Parsimony methods are intuitively appealing and highly efficient, such that they are still used in some cases to seed maximum likelihood optimization algorithms with an initial phylogeny.

Maximum likelihood has been shown to be quite reliable in reconstructing character states, but it does not do as good of a job at giving accurate estimations of the stability of proteins.

[25] Unlike the other two methods, Bayesian inference yields a distribution of possible trees, allowing for more accurate and easily interpretable estimates of the variance of possible outcomes.

One of the first implementations of a Bayesian approach to ancestral sequence reconstruction was developed by Yang and colleagues,[29] where the maximum likelihood estimates of the evolutionary model and tree, respectively, were used to define the prior distributions.

When the size or complexity of the data makes this an unrealistic assumption, it may be more prudent to adopt the fully hierarchical Bayesian approach and infer the joint posterior distribution over the ancestral character states, model, and tree.

[42] Moreover, this fully Bayesian approach is limited to analyzing relatively small numbers of sequences or taxa because the space of all possible trees rapidly becomes too vast, making it computationally infeasible for chain samples to converge in a reasonable amount of time.

[43] In revisiting these experimental data, Oakley and Cunningham[44] found that maximum parsimony methods were unable to accurately reconstruct the known ancestral state of a continuous character (plaque size); these results were verified by computer simulation.

Studies of both mammalian carnivores[45] and fishes[46] have demonstrated that without incorporating fossil data, the reconstructed estimates of ancestral body sizes are unrealistically large.

[23] One may also use this substitution model as the basis for a Bayesian inference procedure, which would consider the posterior belief in the state of an ancestral node given some user-chosen prior.

In all cases where ancestral trait reconstruction is used, findings should be justified with an examination of the biological data that supports model based conclusions.

et al..[13] In horned lizards (genus Phrynosoma), viviparity (live birth) has evolved multiple times, based on ancestral reconstruction methods.

Ancestor reconstruction based on squared-change parsimony (equivalent to maximum likelihood under Brownian motion character evolution[60]) indicates that horned lizards, one of the three main subclades of the lineage, have undergone a major evolutionary increase in the proportion of fast-oxidative glycolytic fibers in their iliofibularis muscles.

Thus stable models recover a more realistic picture of mammalian body mass evolution by permitting large transformations to occur on a small subset of branches.

[54] Phylogenetic comparative methods (inferences drawn through comparison of related taxa) are often used to identify biological characteristics that do not evolve independently, which can reveal an underlying dependence.

[62][63] Felsenstein identified this problem for continuous character evolution and proposed a solution similar to ancestral reconstruction, in which the phylogenetic structure of the data was accommodated statistically by directing the analysis through computation of "independent contrasts" between nodes of the tree related by non-overlapping branches.

The developments of extensive genomic sequence databases in conjunction with advances in biotechnology and phylogenetic inference methods have made ancestral reconstruction cheap, fast, and scientifically practical.

[71] Although the majority of ancestral reconstructions have dealt with proteins, it has also been used to test evolutionary mechanisms at the level of bacterial genomes[72] and primate gene sequences.

A team around Brian Gaschen proposed[74] that such reconstructed strains be used as targets for vaccine design efforts, as opposed to sequences isolated from patients in the present day.

Another team took this idea further by developing a center-of-tree reconstruction method to produce a sequence whose total evolutionary distance to contemporary strains is as small as possible.

Similar experiments with synthetic ancestral sequences obtained by maximum likelihood reconstruction have likewise shown that these ancestors are both functional and immunogenic,[76][77] lending some credibility to these methods.

This method assumes a spatially explicit random walk model of migration to reconstruct ancestral locations given the geographic coordinates of the individuals represented by the tips of the phylogenetic tree.

[92] They examined genomes of several strains of fruit fly from different geographic locations, and observed that one configuration, which they called "standard", was the most common throughout all the studied areas.

In addition, LAZARUS is a collection of Python scripts that wrap the ancestral reconstruction functions of PAML for batch processing and greater ease-of-use.

For example, the ape package[102] in the statistical computing environment R also provides methods for ancestral state reconstruction for both discrete and continuous characters through the 'ace' function, including maximum likelihood.

Diversitree[105] is an R package providing methods for ancestral state reconstruction under Mk2 (a continuous time Markov model of binary character evolution).

Finally, there are several web-server based applications that allow investigators to use maximum likelihood methods for ancestral reconstruction of different character types without having to install any software.

These advances have made it possible to generate a "deep" snapshot of the genetic composition of a rapidly evolving population, such as RNA viruses[119] or tumour cells,[120] in a relatively short amount of time.

This article was adapted from the following source under a CC BY 4.0 license (2015) (reviewer reports): Jeffrey B Joy; Richard H Liang; Rosemary M McCloskey; T Nguyen; Art Poon (12 July 2016).

Phylogeny of a hypothetical genus of plants with pollination states of either "bees", "hummingbirds" or "wind" denoted by pictures at the tips. Pollination state nodes in the phylogenetic tree inferred under maximum parsimony are coloured on the branches leading into them (yellow represents "bee" pollination, red representing "hummingbird" pollination, and black representing "wind" pollination, dual coloured branches are equally parsimonious for the two states coloured). Assignment of "hummingbird" as the root state (because of prior knowledge from the fossil record) leads to the pattern of ancestral states represented by symbols at the nodes of the phylogeny, the state requiring the fewest changes to give rise to the pattern observed at the tips is circled at each node.
A general two-state Markov chain representing the rate of jumps from allele a to allele A. The different types of jumps are allowed to have different rates.
Example of a four-state 1-parameter Markov chain model. Note that in this diagram, transitions between states A and D have been disallowed; it is conventional to not draw the arrow rather than to draw it with a rate of 0.
Graphical representation of an asymmetrical five-state 2-parameter Markov chain model.
Plots of 200 trajectories of each of: Brownian motion with drift and (black); Ornstein-Uhlenbeck with and (green); and Ornstein-Uhlenbeck with and (orange).
Phylogeny of 7 regional strains of Drosophila pseudoobscura, as inferred by Sturtevant and Dobzhansky . [ 92 ] Displayed sequences do not correspond to the original paper, but were derived from the notation in the authors' companion paper [ 11 ] as follows: A (63A-65B), B (65C-68D), C (69A-70A), D (70B-70D), E (71A-71B), F (71A-73C), G (74A-74C), H (75A-75C), I (76A-76B), J (76C-77B), K (78A-79D), L (80A-81D). Inversions inferred by the authors are highlighted in blue along branches.