Substitution matrix

[1] In the process of evolution, from one generation to the next the amino acid sequences of an organism's proteins are gradually altered through the action of DNA mutations.

(Here, a residue refers to an amino acid stripped of a hydrogen and/or a hydroxyl group and inserted in the polymeric chain of a protein.)

Furthermore, mutating an amino acid to a residue with significantly different properties could affect the folding and/or activity of the protein.

[2] If we have two amino acid sequences in front of us, we should be able to say something about how likely they are to be derived from a common ancestor, or homologous.

One of the first amino acid substitution matrices, the PAM (Point Accepted Mutation) matrix was developed by Margaret Dayhoff in the 1970s.

Because the use of very closely related homologs, the observed mutations are not expected to significantly change the common functions of the proteins.

Thus the observed substitutions (by point mutations) are considered to be accepted by natural selection.

To create a PAM1 substitution matrix, a group of very closely related sequences with mutation frequencies corresponding to one PAM unit is chosen.

Based on collected mutational data from this group of sequences, a substitution matrix can be derived.

Dayhoff's methodology of comparing closely related species turned out not to work very well for aligning evolutionarily divergent sequences.

The BLOSUM (BLOck SUbstitution Matrix) series of matrices rectifies this problem.

The probabilities used in the matrix calculation are computed by looking at "blocks" of conserved sequences found in multiple protein alignments.

A number of newer substitution matrices have been proposed to deal with inadequacies in earlier designs.

The real substitution rates in a protein depends not only on the identity of the amino acid, but also on the specific structural or sequence context it is in.

[8] These context-specific substitution matrices lead to generally improved alignment quality at some cost of speed but are not yet widely used.

"Transversion" is the term used to indicate the slower-rate substitutions that change a purine to a pyrimidine or vice versa (A ↔ C, A ↔ T, G ↔ C, and G ↔ T).