[1][2] The overarching goal was to apply high-throughput genome analysis techniques to improve the ability to diagnose, treat, and prevent cancer through a better understanding of the genetic basis of the disease.
A three-year pilot project, begun in 2006, focused on characterization of three types of human cancers: glioblastoma multiforme, lung squamous carcinoma, and ovarian serous adenocarcinoma.
[3] In 2009, it expanded into phase II, which planned to complete the genomic characterization and sequence analysis of 20–25 different tumor types by 2014.
[4][5] The project initially set out to collect and characterize 500 patient samples, more than most genomics studies of its time, and used a variety of different molecular techniques.
The goal of TCGA's pilot project was to establish an infrastructure to collect, molecularly characterize, and analyze 500 cancers and matched controls.
The work required extensive cooperation among a team of scientists from various institutions and assessment of multiple burgeoning high-throughput technologies.
[6] Three tumor types were explored during the pilot phase, glioblastoma multiforme (GBM) and high-grade serous ovarian adenocarcinoma, and lung squamous carcinoma.
Members from the NCI and the NHGRI teams, along with principal investigators funded by the project, comprised the Steering Committee.
TCGA was the first large-scale genomics project funded by the NIH to include significant resources to bioinformatic discovery.
[10] The Biospecimen Core Resource (BCR) was responsible for verifying the quality and quantity of tissue shipped by tissue source sites, isolating DNA and RNA from the samples, performing quality control of these biomolecules, and shipping processed samples to the GSCs and GCCs.
There were two BCRs funded by NCI at the start of the full project: Nationwide Children's Hospital and the International Genomics Consortium.
NCI's Cancer Genomics Hub (CGHub) was the secure repository for storing, cataloging, and accessing sequence-related data.
[13] Number Analyzed in Original Marker Paper In 2008, the TCGA published its first results on glioblastoma multiforme (GBM) in Nature.
A last batch of samples was excluded because the DNA or RNA collected was not of sufficient quality or quantity to be analyzed by all of the different platforms used in the study.
Since the publication of the first marker paper, several analysis groups within the TCGA Network have presented more detailed analyses of the glioblastoma data.
[52] The DNA methylation data analysis team, led by Houtan Noushmehr, PhD and Peter Laird, PhD, identified a distinct subset of glioma samples which displays concerted hypermethylation at a large number of loci, indicating the existence of a glioma-CpG island methylator phenotype (G-CIMP).
They defined four subtypes of the cancer according to gene expression and DNA methylation patterns: immunoreactive, differentiated, proliferative, and mesenchymal.
TCGA reported on the exome sequence, DNA copy number, promoter methylation and messenger RNA characterization of 276 tumor samples of colon and rectal cancers in Nature in July 2012.
The study suggested new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
Starting in 2011, TCGA began holding Annual Scientific Symposiums to discuss and share novel biological discoveries on cancer, analytical methods and translational approaches using the data.
By the project’s completion, TCGA published “marker papers” describing the characterization and basic analyses covering 33 cancer types.
ATAC-seq is a low-cost method for identifying regions of open or active chromatin and positions of DNA-binding proteins.
Through ATAC-seq, researchers were able to identify a tens of thousands of potential DNA regulatory elements specific to different cancers and cell types.