If the gene set falls at either the top (over-expressed) or bottom (under-expressed), it is thought to be related to the phenotypic differences.
These criticisms led to the use of the correlation-weighted Kolmogorov–Smirnov test, the normalized ES, and the false discovery rate calculation, all of which are the factors that currently define standard GSEA.
The method's founders claim that it is a better way to find associations between MSigDB gene sets and microarray data.
GSEA has become standard practice, and there are many websites and downloadable programs that will provide the data sets and run the analysis.
Multi-Ontology Enrichment Tool (MOET) is a web-based ontology analysis tool that provides functionality for multiple ontologies, including Disease, GO, Pathway, Phenotype, and Chemical entities (ChEBI) for multiple species, including rat, mouse, human, bonobo, squirrel, dog, pig, chinchilla, naked mole-rat and vervet (green monkey).
It is simple to use, and results are provided with a few clicks in seconds; no software installations or programming skills are required.
[13] The Molecular Signatures Database hosts an extensive collection of annotated gene sets that can be used with most GSEA Software.
[14] The Broad Institute website is in cooperation with MSigDB and has a downloadable GSEA software, as well a general tutorial.
Analysis can be performed against 12 organisms and 321,251 functional categories using 354 gene identifiers from various databases and technology platforms.
It contains background libraries for transcription regulation, pathways and protein interactions, ontologies including GO and the human and mouse phenotype ontologies, signatures from cells treated with drugs, gene sets associated with human diseases, and expression of genes in different cells and tissues.
[20] GeneSCF is a real-time based functional enrichment tool with support for multiple organisms[21] and is designed to overcome the problems associated with using outdated resources and databases.
[22] Advantages of using GeneSCF: real-time analysis, users do not have to depend on enrichment tools to get updated, easy for computational biologists to integrate GeneSCF with their NGS pipeline, it supports multiple organisms, enrichment analysis for multiple gene list using multiple source database in single run, retrieve or download complete GO terms/Pathways/Functions with associated genes as simple table format in a plain text file.
[32][33] Its primary purpose is to identify pathways and processes that are significantly associated with factor regulating activity.
[37] Instances of InterMine automatically provide enrichment analysis [38] for uploaded sets of genes and other biological entities.
[39] Developed and maintained by the Division of Biomedical Informatics at Cincinnati Children's Hospital Medical Center.
[40] QuSAGE improves power by accounting for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF).
Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.
The applicability of QuSAGE has been extended to longitudinal studies by adding functionality for general linear mixed models.
[41] QuSAGE was used by the NIH/NIAID to identify baseline transcriptional signatures that were associated with human influenza vaccination responses.
g:Profiler supports close to 500 species and strains, including vertebrates, plants, fungi, insects and parasites.
Before GSEA, the accuracy of genome-wide SNP association studies was severely limited by a high number of false positives.
[47] The theory that the SNPs contributing to a disease tend to be grouped in a set of genes that are all involved in the same biological pathway, is what the GSEA-SNP method is based on.
This application of GSEA does not only aid in the discovery of disease-associated SNPs, but helps illuminate the corresponding pathways and mechanisms of the diseases.
[48] Exome sequences from women who had experienced SPTB were compared to those from females from the 1000 Genome Project, using a tool that scored possible disease-causing variants.
This study found that the variants were significantly clustered in sets related to several pathways, all suspects in SPTB.
[48] Gene set enrichment analysis can be used to understand the changes that cells undergo during carcinogenesis and metastasis.
[49] This analysis showed significant changes of expression in genes involved in pathways that have not been previously associated with the progression of renal cancer.
DNA methylation is the most well-studied epigenetic change, and was recently analyzed using GSEA in relation to schizophrenia-related intermediate phenotypes.
Previous studies have shown that long-term depression symptoms are correlated with changes in immune response and inflammatory pathways.
This study found that those people who rated with the most severe depression symptoms also had significant expression differences in those gene sets, and this result supports the association hypothesis.