
In GSEA, DNA microarrays, or now RNA-Seq, are still performed and compared between two cell categories, but instead of focusing on individual genes in a long list, the focus is put on a gene set. A database of these predefined sets can be found at the Molecular signatures database (MSigDB). Gene set enrichment analysis uses a priori gene sets that have been grouped together by their involvement in the same biological pathway, or by proximal location on a chromosome. By doing so, this method resolves the problem of the undetectable, small changes in the expression of single genes. Gene Set Enrichment Analysis was developed to focus on the changes of expression in groups of a priori defined gene sets. Multiple genes are linked to a single biological pathway, and so it is the additive change in expression within gene sets that leads to the difference in phenotypic expression. However, this method of comparison is not sensitive enough to detect the subtle differences between the expression of individual genes, because diseases typically involve entire groups of genes. Researchers would perform these microarrays on thousands of different genes, and compare the results of two different cell categories, e.g. In order to seek out genes associated with diseases, researchers used DNA microarrays, which measure the amount of gene expression in different cells. While the completion of the Human Genome Project gifted researchers with an enormous amount of new data, it also left them with the problem of how to interpret and analyze it. 4.1 GSEA and genome-wide association studies.


This can be done by comparing the input gene set to each of the bins (terms) in the gene ontology – a statistical test can be performed for each bin to see if it is enriched for the input genes.

Researchers performing high-throughput experiments that yield sets of genes (for example, genes that are differentially expressed under different conditions) often want to retrieve a functional profile of that gene set, in order to better understand the underlying biological processes. Transcriptomics technologies and proteomics results often identify thousands of genes which are used for the analysis. The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Gene set enrichment analysis (GSEA) (also functional enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with disease phenotypes.

Schematic overview of the modular structure underlying procedures for gene set enrichment analysis
