Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.
[5] Originally, the rationale behind creating the database was to have a semi-automated method of curating information on known protein families to improve the efficiency of annotating genomes.
[9][10][11] The InterPro website allows users to submit protein or DNA sequences to search for matches to families in the Pfam database.
[12] Rather than performing a typical BLAST search, Pfam uses profile hidden Markov models, which give greater weight to matches at conserved sites, allowing better remote homology detection, making them more suitable for annotating genomes of organisms with no well-annotated close relatives.
[15] New families come from a range of sources, primarily the PDB and analysis of complete proteomes to find genes with no Pfam hit.
This HMM is then searched against sequence databases, and all hits that reach a curated gathering threshold are classified as members of the protein family.
They are groupings of related families that share a single evolutionary origin, as confirmed by structural, functional, sequence and HMM comparisons.
[17] To identify possible clan relationships, Pfam curators use the Simple Comparison Of Outputs Program (SCOOP) as well as information from the ECOD database.
Pfam-B contained a large number of small families derived from clusters produced by an algorithm called ADDA.
This allowed for better centralisation of updates, and grouping with other Xfam projects such as Rfam, TreeFam, iPfam and others, whilst retaining critical resilience provided by hosting from multiple centres.
[23] From circa 2014 to 2016, Pfam underwent a substantial reorganisation to further reduce manual effort involved in curation and allow for more frequent updates.
[24] Curation of such a large database presented issues in terms of keeping up with the volume of new families and updated information that needed to be added.
It is anticipated that while community involvement will greatly improve the level of annotation of these families, some will remain insufficiently notable for inclusion in Wikipedia, in which case they will retain their original Pfam description.
In release 26.0, developers moved to a new system that allowed registered users anywhere in the world to add or modify Pfam families.