CpG should not be confused with GpC, the latter meaning that a guanine is followed by a cytosine in the 5' → 3' direction of a single-stranded sequence.
CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance.
For example, in the human genome, which has a 42% GC content,[4] a pair of nucleotides consisting of cytosine followed by guanine would be expected to occur
[5] This underrepresentation is a consequence of the high mutation rate of methylated CpG sites: the spontaneously occurring deamination of a methylated cytosine results in a thymine, and the resulting G:T mismatched bases are often improperly resolved to A:T; whereas the deamination of unmethylated cytosine results in a uracil, which as a foreign base is quickly replaced by a cytosine by the base excision repair mechanism.
[17] Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than would be expected.
[14] A 2002 study revised the rules of CpG island prediction to exclude other GC-rich genomic sequences such as Alu repeats.
Based on an extensive search on the complete sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp were found more likely to be the "true" CpG islands associated with the 5' regions of genes if they had a GC content greater than 55%, and an observed-to-expected CpG ratio of 65%.
There is a special enzyme in humans (Thymine-DNA glycosylase, or TDG) that specifically replaces T's from T/G mismatches.
However, due to the rarity of CpGs, it is theorised to be insufficiently effective in preventing a possibly rapid mutation of the dinucleotides.
[22] CpG islands also occur frequently in promoters for functional noncoding RNAs such as microRNAs.
[23] In humans, DNA methylation occurs at the 5 position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines.
[24] In cancers, loss of expression of genes occurs about 10 times more frequently by hypermethylation of promoter CpG islands than by mutations.
[25] In contrast, in one study of colon tumors compared to adjacent normal-appearing colonic mucosa, 1,734 CpG islands were heavily methylated in tumors whereas these CpG islands were not methylated in the adjacent mucosa.
[27] A third study found more than 2,000 genes differentially methylated between colon cancers and adjacent mucosa.
[30] Thus microRNAs with hypermethylated promoters may be allowing over-expression of hundreds to thousands of genes in a cancer.
DNA repair genes are frequently repressed in cancers due to hypermethylation of CpG islands within their promoters.
[31] About seventeen types of cancer are frequently deficient in one or more DNA repair genes due to hypermethylation of their promoters.
Promoter hypermethylation of MLH1 occurs in 48% of non-small-cell lung cancer squamous cell carcinomas.
PARP1 and FEN1 are essential genes in the error-prone and mutagenic DNA repair pathway microhomology-mediated end joining.
Thus, CpG island hyper/hypo-methylation in the promoters of DNA repair genes are likely central to progression to cancer.
Since age has a strong effect on DNA methylation levels on tens of thousands of CpG sites, one can define a highly accurate biological clock (referred to as epigenetic clock or DNA methylation age) in humans and chimpanzees.
[50] In the mouse brain, 4.2% of all cytosines are methylated, primarily in the context of CpG sites, forming 5mCpG.
At 24 hours after training, 9.2% of the genes in the rat genome of hippocampus neurons were differentially methylated.
There were 1,223 differentially methylated genes in the anterior cingulate cortex of mice four weeks after contextual fear conditioning.
[54] As reviewed in 2018,[55] in brain neurons, 5mC is oxidized by the ten-eleven translocation (TET) family of dioxygenases (TET1, TET2, TET3) to generate 5-hydroxymethylcytosine (5hmC).
Two reviews[56][57] summarize the large body of evidence for the critical and essential role of ROS in memory formation.
The DNA demethylation of thousands of CpG sites during memory formation depends on initiation by ROS.
Adherence of OGG1 to the 5mCp-8-OHdG site recruits TET1, allowing TET1 to oxidize the 5mC adjacent to 8-OHdG, as shown in the first figure in this section.
Alu elements are CpG-rich in a longer amount of sequence, unlike LINEs and ERVs.
[59] However, this is a result that is analyzed over time because older Alu elements show more CpG loss in sites of neighboring DNA compared to younger ones.