RF distances have been criticized as biased,[3] but they represent a relatively intuitive measure of the distances between phylogenetic trees and therefore remain widely used (the original 1981 paper describing Robinson-Foulds distances[1] was cited more than 2700 times by 2023 based on Google Scholar).
Nevertheless, the biases inherent to the RF distances suggest that researches should consider using "Generalized" Robinson–Foulds metrics[4] that may have better theoretical and practical performance and avoid the biases and misleading attributes of the original metric.
, which contracts an edge (combining the nodes, creating a union of their sets).
expands an edge (decontraction), where the set can be split in any fashion.
The RF distance corresponds to an equivalent similarity metric that reflects the resolution of the strict consensus of two trees, first used to compare trees in 1980.
[6] In their 1981 paper[1] Robinson and Foulds proved that the distance is in fact a metric.
In 1985 Day gave an algorithm based on perfect hashing that computes this distance that has only a linear complexity in the number of nodes in the trees.
A randomized algorithm that uses hash tables that are not necessarily perfect has been shown to approximate the Robinson-Foulds distance with a bounded error in sublinear time.
In phylogenetics, the metric is often used to compute a distance between two trees.
The treedist program in the PHYLIP suite offers this function, as does the RAxML_standard package, the DendroPy Python library (under the name "symmetric difference metric"), and R packages TreeDist (RobinsonFoulds() function) and phangorn (treedist() function).
For comparing groups of trees, the fastest implementations include HashRF and MrsRF.
The Robinson–Foulds metric has also been used in quantitative comparative linguistics to compute distances between trees that represent how languages are related to each other.
The RF metric remains widely used because the idea of using the number of splits that differ between a pair of trees is a relatively intuitive way to assess the differences among trees for many systematists.
This is the primary strength of the RF distance and the reason for its continued use in phylogenetics.
Of course, the number of splits that differ between a pair of trees depends on the number of taxa in the trees so one might argue that this unit is not meaningful.
However, it is straightforward to normalize RF distances so they range between zero and one.
However, the RF metric also suffers a number of theoretical and practical shortcomings:[7][5] Another issue to consider when using RF distances is that differences in one clade may be trivial (perhaps if the clade resolves three species within a genus differently) or may be fundamental (if the clade is deep in the tree and defines two fundamental subgroups, such as mammals and birds).
Regardless of the behaviour of any specific tree distance a practicing evolutionary biologist might view some tree rearrangements as "important" and other rearrangements as "trivial".
[4] The best-performing generalized Robinson-Foulds distances have a basis in information theory, and measure the distance between trees in terms of the quantity of information that the trees' splits hold in common (measured in bits).