Phylogenetic profiling

A number of these techniques were developed by David Eisenberg and colleagues; phylogenetic profile comparison was introduced in 1999 by Pellegrini, et al.[1] Over 2000 species of bacteria, archaea, and eukaryotes are now represented by complete DNA genome sequences.

For a given protein family, its presence or absence in each genome (in the original, binary, formulation) is represented by either 1 (present) or 0 (absent).

A biological process such as photosynthesis, methanogenesis, or histidine biosynthesis may require the concerted action of many proteins.

Phylogenetic profiling has led to numerous discoveries in biology, including previously unknown enzymes in metabolic pathways, transcription factors that bind to conserved regulatory sites, and explanations for roles of certain mutations in human disease.

First, co-occurrence of two protein families often represents recent common ancestry of two species rather than a conserved functional relationship; disambiguating these two sources of correlation may require improved statistical methods.