The primary goal of the ENCODE project is to determine the role of the remaining component of the genome, much of which was traditionally regarded as "junk".
The activity and expression of protein-coding genes can be modulated by the regulome - a variety of DNA elements, such as promoters, transcriptional regulatory sequences, and regions of chromatin structure and histone modification.
It is thought that changes in the regulation of gene activity can disrupt protein production and cell processes and result in disease.
ENCODE is also intended as a comprehensive resource to allow the scientific community to better understand how the genome can affect human health, and to "stimulate the development of new therapies to prevent and treat these diseases".
[5]The ENCODE Consortium is composed primarily of scientists who were funded by US National Human Genome Research Institute (NHGRI).
The goal of the pilot phase was to identify a set of procedures that, in combination, could be applied cost-effectively and at high-throughput to accurately and comprehensively characterize large regions of the human genome.
[5] The pilot phase tested and compared existing methods to rigorously analyze a defined portion of the human genome sequence.
The goal of these efforts was to identify a suite of approaches that would allow the comprehensive identification of all the functional elements in the human genome.
The ENCODE pilot project process involved close interactions between computational and experimental scientists to evaluate a number of methods for annotating the human genome.
These regions served as the foundation on which to test and evaluate the effectiveness and efficiency of a diverse set of methods and technologies for finding various functional elements in human DNA.
The remaining 50% of the 30Mb of sequence were composed of thirty, 500kb regions selected according to a stratified random-sampling strategy based on gene density and level of non-exonic conservation.
The decision to use these particular criteria was made in order to ensure a good sampling of genomic regions varying widely in their content of genes and other functional elements.
[22] The authors described the production and the initial analysis of 1,640 data sets designed to annotate functional elements in the entire human genome, integrating results from diverse experiments within cell types, related experiments involving 147 different cell types, and all ENCODE data with other resources, such as candidate regions from genome-wide association studies (GWAS) and evolutionary constrained regions.
The most important new elements of the "encyclopedia" include: Capturing, storing, integrating, and displaying the diverse data generated is challenging.
This important concept was defined at an international meeting held in Ft. Lauderdale in January 2003 as a research project specifically devised and implemented to create a set of data, reagents, or other material whose primary utility will be as a resource for the broad scientific community.
[30] The extension to model organisms permits biological validation of the computational and experimental findings of the ENCODE project, something that is difficult or impossible to do in humans.
In late 2010, the modENCODE consortium unveiled its first set of results with publications on annotation and integrative analysis of the worm and fly genomes in Science.
The project has merged the C. elegans and Drosophila groups and focuses on the identification of additional transcription factor binding sites of the respective organisms.
The fruitENCODE: an encyclopedia of DNA elements for fruit ripening is a plant ENCODE project that aims to generate DNA methylation, histone modifications, DHS, gene expression, transcription factor binding datasets for all fleshy fruit species at different developmental stages.
Although the consortium claims they are far from finished with the ENCODE project, many reactions to the published papers and the news coverage that accompanied the release were favorable.
The Nature editors and ENCODE authors "... collaborated over many months to make the biggest splash possible and capture the attention of not only the research community but also of the public at large".
[47][48][49] Somewhat arbitrary choice of cell lines and transcription factors as well as lack of appropriate control experiments were additional major criticisms of ENCODE as random DNA mimics ENCODE-like 'functional' behavior.
[52] Furthermore, much of the genome that is being disputed by critics seems to be involved in epigenetic regulation such as gene expression and appears to be necessary for the development of complex organisms.
[54] Recently, ENCODE researchers reiterated that its main goal is identifying functional elements in the human genome.
[54] Ewan Birney, one of the ENCODE researchers, commented that "function" was used pragmatically to mean "specific biochemical activity" which included different classes of assays: RNA, "broad" histone modifications, "narrow" histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and Exons.
[57] In 2014, ENCODE researchers noted that in the literature, functional parts of the genome have been identified differently in previous studies depending on the approaches used.
[59] The analysis of transcription factor binding data generated by the ENCODE project is currently available in the web-accessible repository FactorBook.