[5] Tandem repeats of short oligopeptides that are rich in glycine, proline, serine or threonine are capable of forming flexible structures that bind ligands under certain pH and temperature conditions.
[18] They may even function as frame-shift checkpoints, by shifting to an unusual amino acid content that makes the protein highly unstable or insoluble, which in turn triggers fast recycling, before any further cellular damage.
[8] A third explanation may be based on micro-evolutionary forces and, more specifically, on the bias of DNA polymerase slippage for certain di- tri- or tetra-nucleotides .
[29][30] By originating from genetic instability, they may cause, at the DNA level, a certain region of the protein to expand or contract and even cause frame-shifts (phase-variants) that affect microbial pathogenicity or provide raw material for evolution.
[8][32] During early evolution, when only few amino acids were available and the primary genetic code was still expanding its repertoire, the first proteins were assumed to be short, repetitive and therefore, of low complexity.
[33][34] Thus, modern LCRs could represent primordial aspects of the evolution towards the protein world and may provide clues about the functions of the early proto-peptides.
Due to the high effective population size and short generation times of prokaryotes, the de novo emergence of a mildly or moderately deleterious amino acid repeat or LCR should quickly be filtered out by strong selective forces.
[8] The amino acids with the highest frequency in LCRs are glycine and alanine, with their respective codons GGC and GCC being the most frequent, as well as complementary.
Based on several different criteria and sources of data, Higgs and Pudritz[38] suggest G, A, D, E, V, S, P, I, L, T as the early amino acids of the genetic code.
[8] They thus hypothesize and propose that, in a cell-free environment, the early genetic code may have also produced low complexity oligo-peptides from valine and leucine.
[8] However, later on, within a more complex cellular environment, these highly hydrophobic LCRs became inappropriate or even toxic from a protein interaction perspective and have been selected against ever since.
[8] Compression-based tools have also been used to perform such analysis providing higher sensitivity while mitigating the risk of overestimation inherent in other methods.