[3] There is a small fraction of CpG islands that can overlap or be in close proximity to promoter regions of transcription start sites.
Currently, the major research interest lies in investigating disease conditions such as cancer to identify regions of the DNA that has undergone extensive methylation changes.
The genes contained in these regions are of functional interest as they may offer a mechanistic explanation to the underlying genetic causes of a disease.
Typing technologies are targeted towards a small number of loci across many samples, and involve the use of techniques such as PCR, restriction enzymes, and mass spectrometry.
[14][15][16][17] Other methods mapping and profiling the methylome have been effective but are not without their limitations that can affect resolution, level of throughput, or experimental variations.
[21] The short length of these fragments is important in obtaining adequate resolution, improving the efficiency of the downstream step in immunoprecipitation, and reducing fragment-length effects or biases.
The classical immunoprecipitation technique is then applied: magnetic beads conjugated to anti-mouse-IgG are used to bind the anti-5mC antibodies, and unbound DNA is removed in the supernatant.
There are additional standard steps required in signal processing to correct for hybridization issues such as noise, as is the case with most array technologies.
The MeDIP-seq approach, i.e. the coupling of MeDIP with next generation, short-read sequencing technologies such as 454 pyrosequencing or Illumina (Solexa), was first described by Down et al. in 2008.
[20] The high-throughput sequencing of the methylated DNA fragments produces a large number of short reads (36-50bp[26] or 400 bp,[27] depending on the technology).
Once regions of DNA methylation are identified, a number of bioinformatics analyses can be applied to answer certain biological questions.
[29] By identifying mutational events leading to hypermethylation and subsequent repression of known tumour-suppressor genes, one can more specifically characterize the contributing factors to the cause of the disease.
Also, one can try and investigate and identify whether some epigenetic regulator has been affected such as DNA methyltransferase (DNMT);[21] in these cases, enrichment may be more limited.
Gene-set analysis (for example using tools like DAVID and GoSeq) has been shown to be severely biased when applied to high-throughput methylation data (e.g. MeDIP-seq and MeDIP-ChIP); it has been suggested that this can be corrected using sample label permutations or using a statistical model to control for differences in the numberes of CpG probes / CpG sites that target each gene.
However, this level of resolution may not be required for most applications, as the methylation status of CpG sites within < 1000 bp has been shown to be significantly correlated.