Structural Classification of Proteins database

Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor.

Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor.

[3] It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England.

Since then SCOPe team from UC Berkeley has been responsible for updating the database in a compatible manner, with a combination of automated and manual methods.

For example, the "All-α proteins" class contains >280 distinct folds, including: Globin-like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and Type I dockerin domains (tandem repeat of two calcium-binding loop-helix motifs, distinct from the EF-hand).

This is a largest grouping of proteins for which structural similarity is sufficient to indicate evolutionary relatedness and therefore share a common ancestor.

Human expertise is used to decide whether certain proteins are evolutionary related and therefore should be assigned to the same superfamily, or their similarity is a result of structural constraints and therefore they belong to the same fold.

Another database, FSSP, is purely automatically generated (including regular automatic updates) but offers no classification, allowing the user to draw their own conclusion as to the significance of structural relationships based on the pairwise comparisons of individual protein structures.

By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical structure.

With the accelerating pace of protein structure publications, the limited automation of classification could not keep up, leading to a non-comprehensive dataset.

The Structural Classification of Proteins extended (SCOPe) database was released in 2012 with far greater automation of the same hierarchical system and is full backwards compatible with SCOP version 1.75.

Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their relationships to the most similar other structures.

The Evolutionary Classification of Protein Domains (ECOD) database released in 2014 is a similar to SCOPe expansion of SCOP version 1.75.

Unlike the compatible SCOPe, it renames the class-fold-superfamily-family hierarchy into an architecture-X-homology-topology-family (A-XHTF) grouping, with the last level mostly defined by Pfam and supplemented by HHsearch clustering for uncategorized sequences.