ProbCons

In bioinformatics and proteomics, ProbCons is an open source software for probabilistic consistency-based multiple alignment of amino acid sequences.

It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.

[1][2] The following describes the basic outline of the ProbCons algorithm.

[3] For every pair of sequences compute the probability that letters

are paired in

an alignment that is generated by the model.

Pr [

∈ a } Pr [ a

{\displaystyle {\begin{aligned}P(x_{i}\sim y_{i}|x,y)\ {\overset {\underset {\mathrm {def} }{}}{=}}&\ \Pr[x_{i}\sim y_{i}{\text{ in some }}a|x,y]\\[8pt]=&\ \sum _{{\text{alignment }}a \atop {{\text{with }}x_{i}-y_{i}}}\Pr[a|x,y]\\[2pt]=&\ \sum _{{\text{alignment }}a}\mathbf {1} \{x_{i}-y_{i}\in a\}\Pr[a|x,y]\end{aligned}}}

is equal to 1 if

The accuracy of an alignment

with respect to another alignment

is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

Pr [ a

( acc ⁡ (

Pr [ a

x , y ] acc ⁡ (

∈ a } Pr [ a

{\displaystyle {\begin{aligned}E_{\Pr[a|x,y]}(\operatorname {acc} (a^{*},a))&=\sum _{a}\Pr[a|x,y]\operatorname {acc} (a^{*},a)\\&={\frac {1}{\min(|x|,|y|)}}\cdot \sum _{a}\mathbf {1} \{x_{i}\sim y_{i}\in a\}\Pr[a|x,y]\\&={\frac {1}{\min(|x|,|y|)}}\cdot \sum _{x_{i}-y_{i}}P(x_{i}\sim y_{j}|x,y)\end{aligned}}}

This yields a maximum expected accuracy (MEA) alignment:

Pr [ a

( acc ⁡ (

{\displaystyle E(x,y)=\arg \max _{a^{*}}\;E_{\Pr[a|x,y]}(\operatorname {acc} (a^{*},a))}

All pairs of sequences x,y from the set of all sequences

are now re-estimated using all intermediate sequences z:

This step can be iterated.

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score.

Cluster similarity is defined using weighted average over pairwise sequence similarity.

Finally compute the MSA using progressive alignment or iterative alignment.