Horizontal or lateral gene transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance.
Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation.
[5] To infer HGT events, which may not necessarily result in phenotypic changes, most contemporary methods are based on analyses of genomic sequence data.
[7] The main feature of parametric methods is that they only rely on the genome under study to infer HGT events that may have occurred on its lineage.
For instance, the conflicting phylogenies can be the result of events not accounted for by the model, such as unrecognized paralogy due to duplication followed by gene losses.
[8] Larger sliding windows can account for this variability at the cost of a reduced ability to detect smaller HGT regions.
Also, the donor's composition must significantly differ from the recipient's to be identified as abnormal, a condition that might be missed in the case of short- to medium-distance HGT, which are the most prevalent.
Furthermore, it has been reported that recently acquired genes tend to be AT-richer than the recipient's average,[15] which indicates that differences in GC-content signature may result from unknown post-acquisition mutational processes rather than from the donor's genome.
[17] The revealed similarities in the periodicity were strong supporting evidence for a case of massive HGT between the bacteria and the archaea kingdoms.
[39] A machine-learning approach combining oligonucleotide frequency scans with context information was reported to be effective at identifying genomic islands.
[40] In another study, the context was used as a secondary indicator, after removal of genes which are strongly thought to be native or non-native through the use of other parametric methods.
Similarly, in the presence of incomplete lineage sorting, explicit phylogeny methods can erroneously infer HGT events.
[42] That is why some explicit model-based methods test multiple evolutionary scenarios involving different kinds of events, and compare their fit to the data given parsimonious or probabilistic criteria.
By interpreting the edit path of pruning and regrafting, HGT candidate nodes can be flagged and the host and donor genomes inferred.
[49][56][57][58] Because conversion of one tree to another by a minimum number of SPR operations is NP-Hard,[59] solving the problem becomes considerably more difficult as more nodes are considered.
The computational challenge lies in finding the optimal edit path, i.e., the one that requires the fewest steps,[60][61] and different strategies are used in solving the problem.
[63] The T-REX (webserver) includes a number of HGT detection methods [56] (mostly SPR-based) and allows users to calculate the bootstrap support of the inferred transfers.
Reconciliation methods can rely on a parsimonious or a probabilistic framework to infer the most likely scenario(s), where the relative cost/probability of D, T, L events can be fixed a priori or estimated from the data.
[64] The space of DTL reconciliations and their parsimony costs—which can be extremely vast for large multi-copy gene family trees—can be efficiently explored through dynamic programming algorithms.
[64][65][66] In some programs, the gene tree topology can be refined where it was uncertain to fit a better evolutionary scenario as well as the initial sequence alignment.
Thus, the threshold of the minimum number of foreign top BLAST hits to observe to decide a gene was transferred is highly dependent on the taxonomic coverage of sequence databases.
[78] The molecular clock hypothesis posits that homologous genes evolve at an approximately constant rate across different species.
[80] Simple approaches compare the distribution of similarity scores of particular sequences and their orthologous counterparts in other species; HGT are inferred from outliers.
In addition, the method allows inference of potential donor and recipient species and provides an estimation of the time since the HGT event.
Absence of a homolog in some members of a group of closely related species is an indication that the examined gene might have arrived via an HGT event.
[84] Marked portions of strain-specific genes were found to have no significant hit in the reference database, and were possibly acquired by HGT transfers from other bacteria.
[95] This method of detection is, however, restricted to the sites in common to all analysed sequences, limiting the analysis to a group of closely related organisms.
[14] Parametric and phylogenetic methods draw on different sources of information; it is therefore difficult to make general statements about their relative performance.
Nonetheless, studies involving the comparison of several phylogenetic methods in a simulation framework could provide quantitative assessment of their respective performances, and thus help the biologist in choosing objectively proper tools.
[108] This article was adapted from the following source under a CC BY 4.0 license (2015) (reviewer reports): Matt Ravenhall; Nives Škunca; Florent Lassalle; Christophe Dessimoz (May 2015).