The Sm proteins were first discovered as antigens targeted by so-called anti-Sm antibodies in a patient with a form of systemic lupus erythematosus (SLE), a debilitating autoimmune disease.
Individual LSm proteins assemble into a six or seven member doughnut ring (more properly termed a torus), which usually binds to a small RNA molecule to form a ribonucleoprotein complex.
The LSm torus assists the RNA molecule to assume and maintain its proper three-dimensional structure.
The story of the discovery of the first LSm proteins begins with a young woman, Stephanie Smith, who was diagnosed in 1959 with systemic lupus erythematosus (SLE), eventually succumbing to complications of the disease in 1969 at the age of 22.
[1] During this period, she was treated at New York's Rockefeller University Hospital, under the care of Dr. Henry Kunkel and Dr. Eng Tan.
As those with an autoimmune disease, SLE patients produce antibodies to antigens in their cells' nuclei, most frequently to their own DNA.
However, Kunkel and Tan found in 1966 that Smith produced antibodies to a set of nuclear proteins, which they named the 'smith antigen' (Sm Ag).
The smith antigen was found to be a complex of ribonucleic acid (RNA) molecules and multiple proteins.
A set of uridine-rich small nuclear RNA (snRNA) molecules was part of this complex, and given the names U1, U2, U4, U5 and U6.
Four of these snRNAs (U1, U2, U4 and U5) were found to be tightly bound to several small proteins, which were named SmB, SmD, SmE, SmF, and SmG in decreasing order of size.
[3] After a few more modifications, the spliced pre-mRNA becomes messenger RNA (mRNA) which is then exported from the nucleus and translated into a protein by ribosomes.
In the bacterium Escherichia coli, the Sm-like protein HF-I encoded by the gene hfq was described in 1968 as an essential host factor for RNA bacteriophage Qβ replication.
The genome of Saccharomyces cerevisiae (Baker's Yeast) was sequenced in the mid-1990s, providing a rich resource for identifying homologs of these human proteins.
[6] In 1999, crystals of recombinant Sm proteins were prepared, allowing X-ray crystallography and determination of their atomic structure in three dimensions.
The exact chemical nature of this binding varies, but common motifs include stacking the heterocyclic base (often uracil) between planar side chains of two amino acids, hydrogen bonding to the heterocyclic base and/or the ribose, and salt bridges to the phosphate group.
In other cases, this facilitates modification or degradation of the RNA, or the assembly, storage, and intracellular transport of ribonucleoprotein complexes.
[9] The Sm ring is found in the nucleus of all eukaryotes (about 2.5 × 106 copies per proliferating human cell), and has the best understood functions.
The Sm ring permanently binds to the U1, U2, U4 and U5 snRNAs which form four of the five snRNPs that constitute the major spliceosome.
The Sm ring also permanently binds to the U11, U12 and U4atac snRNAs which form four of the five snRNPs (including the U5 snRNP) that constitute the minor spliceosome.
Experiments with Saccharomyces cerevisiae (budding yeast) mutations suggest that the Lsm2-8 ring assists the reassociation of the U4 and U6 snRNPs into the U4/U6 di-snRNP.
These may have a chaperone function in the SMN complex to assist the formation of the Sm ring on the Sm-class snRNAs.
[19] A large protein of unknown function, ataxin-2, associated with the neurodegenerative disease spinocerebellar ataxia type 2, also has a N-terminal LSm domain.
It is not universally present in all bacteria, but has been found in Pseudomonadota, Bacillota, Spirochaetota, Thermotogota, Aquificota, and one species of Archaea.
[21][22] A second bacterial LSm protein is YlxS (sometimes also called YhbC), which was first identified in the soil bacterium Bacillus subtilis.
Its function is unknown, but amino acid sequence homologs are found in virtually every bacterial genome to date, and it may be an essential protein.
[23] The middle domain of the small conductance mechanosensitive channel MscS in Escherichia coli forms a homoheptameric ring.
[24] This LSm domain has no apparent RNA-binding function, but the homoheptameric torus is part of the central channel of this membrane protein.
Based on the known functions of LSm proteins in eukaryotes and archaea, the ancestral function may have been processing of pre-ribosomal RNA, pre-transfer RNA, and pre-RNase P. Then, according to this hypothesis, the seven ancestral eukaryote LSm genes duplicated again to seven pairs of Sm/LSm paralogs; LSm1/SmB, LSm2/SmD1, LSm3/SmD2, LSm4/SmD3, LSm5/SmE, LSm6/SmF and LSm7/SmG.
The LSm1/LSm8 paralog pair also seems to have originated prior to the last common eukaryote ancestor, for a total of at least 15 LSm protein genes.
Small nuclear ribonucleoproteins (snRNPs) assemble in a tightly orchestrated and regulated process that involves both the cell nucleus and cytoplasm.