Protein function prediction

[2] Researchers can query this database with a protein name or accession number to retrieve associated Gene Ontology (GO) terms or annotations based on computational or experimental evidence.

While techniques such as microarray analysis, RNA interference, and the yeast two-hybrid system can be used to experimentally demonstrate the function of a protein, advances in sequencing technologies have made the rate at which proteins can be experimentally characterized much slower than the rate at which new sequences become available.

[6] For example, the yeast Gal1 and Gal3 proteins are paralogs (73% identity and 92% similarity) that have evolved very different functions with Gal1 being a galactokinase and Gal3 being a transcriptional inducer.

[12][20][21][22][23] The Structurally Aligned Local Sites of Activity (SALSA)[21] method, developed by Mary Jo Ondrechen and students, utilizes computed chemical properties of the individual amino acids to identify local biochemically active sites.

This is complicated by certain active sites not being formed – essentially existing – until the protein undergoes conformational changes brought on by the binding of small molecules.

This work was carried out as a response to realizing that water molecules are visible in the electron density maps produced by X-ray crystallography.

This led to the idea of immersing the purified protein crystal in other solvents (e.g. ethanol, isopropanol, etc.)

This process is repeated for multiple solvents and then this data can be used to try to determine potential active sites on the protein.

Many of the newer methods for protein function prediction are not based on comparison of sequence or structure as above, but on some type of correlation between novel genes/proteins and those that already have annotations.

Several methods have been developed to predict gene function on the local genomic or phylogenomic context and structure of genes: Phylogenetic profiling is based on the observation that two or more proteins with the same pattern of presence or absence in many different genomes most likely have a functional link.

[3][28] For example, proteins involved in the same metabolic pathway are likely to be present in a genome together or are absent altogether, suggesting that these genes work together in a functional context.

In prokaryotes, clusters of genes that are physically close together in the genome often conserve together through evolution, and tend to encode proteins that interact or are part of the same operon.

[3] Thus, chromosomal proximity also called the gene neighbour method[31] can be used to predict functional similarity between proteins, at least in prokaryotes.

[40] For example, the developers of the bioPIXIE system used a wide variety of Saccharomyces cerevisiae (yeast) genomic data to produce a composite functional network for that species.

Many algorithms have been developed to predict function based on the integration of several data sources (e.g. genomic, proteomic, protein interaction, etc.

[39][42] Disadvantages of some function prediction algorithms have included a lack of accessibility, and the time required for analysis.

[44] Mantis: A consensus-driven function prediction tool that dynamically integrates multiple reference databases.

A part of a multiple sequence alignment of four different hemoglobin protein sequences. Similar protein sequences, usually indicate shared functions.
An alignment of the toxic proteins ricin and abrin . Structural alignments may be used to determine if two proteins have similar functions even when their sequences differ.
Computational solvent mapping of AMA1 protein using fragment-based computational solvent mapping (FTMAP) by computationally scanning the surface of AMA1 with 16 probes (small organic molecules) and defining the locations where the probes cluster (marked as colorful regions on the protein surface) [ 25 ]
A conserved operon in three bacterial genomes (here: genes involved in Tryptophan biosynthesis). The conserved order suggests that these genes act together.
An example protein interaction network, produced through the STRING web resource. Patterns of protein interactions within networks are used to infer function. Here, products of the bacterial trp genes coding for tryptophan synthase are shown to interact with themselves and other, related proteins.