The degree of similarity can be highly variable, with some repeats maintaining only a few conserved amino acid positions and a characteristic length.
As a "rule of thumb", short repetitive sequences (e.g. those below the length of 10 amino acids) may be intrinsically disordered, and not part of any folded protein domains.
[3][4][5] Examples of disordered repetitive sequences include the 7-mer peptide repeats found in the RPB1 subunit of RNA polymerase II,[6] or the tandem beta-catenin or axin binding linear motifs in APC (adenomatous polyposis coli).
[13] Sequence-based strategies, based on homology search [14] or domain assignment,[15][16] mostly underestimate TRs due to the presence of highly degenerate repeat units.
Alternatively, methods requiring no prior knowledge for the detection of repeated substrings can be based on self-comparison,[18][19] clustering [20] [21] or hidden Markov models.