About Gemma


Gemma is an open-source database and software system for the meta-analysis of gene expression data. Gemma contains data from hundreds of public microarray data sets, referencing hundreds of published papers. Registered with NIF

Copyright © 2007-2011  Terms and conditions

Funded by:







Gemma's Goals

The goals of Gemma are to:

  • Aggregate public gene expression studies for the purposes of reanalysis. Our current focus is on human, mouse and rat nervous system data, but Gemma supports data from any species.
  • Perform analyses on the expression data and store the results for rapid and easy access.
  • Reanalyze probe sequences to allow consistent cross-platform comparisons.
  • To serve as a framework for the development and evaluation of novel algorithms for expression analysis and meta-analysis.
  • Support a range of analytical approaches including but not limited to coexpression and differential expression.
  • Support a variety of expression technologies, including Affymetrix, Illumina and other oligonucleotide arrays, one channel and ratiometric cDNA arrays, and sequence-based techniques.
  • Provide consistent annotation/descriptions of data sets to enhance usability of the data.
  • Provide external sites with processed data, web services, software and other means of including Gemma tools and data in other systems.
  • Allow users to upload expression data for inclusion in Gemma, either "privately" for their own analyses or to be made public.
  • Provide a software system that can be installed and used by others.

To these ends, Gemma consists of:

  • A database and software framework for storing and accessing expression data with a focus onanalysis. The software is released under the Apache 2.0 license, which is liberal enough to allow commercialization, modification, and reuse by anybody.
  • A set of administrative and curation tools for loading, tracking, processing, annotating, analysis and quality control of expression data.
  • A sequence analysis pipeline for probe sequences.
  • Implementations of algorithms for analysis of expression data.


Main analyses

Gemma provides two basic types of analysis of each data set: coexpression and differential expression. The emphasis is on being able to search and compare results across data sets.

Coexpression

The methods used for this "recurring coexpression" analysis are essentially as described inLee et al, 2004. Briefly, each data set is preprocessed and analyzed to identify pairs of genes that are strongly coexpressed. The on-line analysis searches these results for pairs of genes that are coexpressed in multiple data sets, starting from a gene you choose.

Differential expression

Data sets are considered for differential expression analysis if they have a factorial design with up to two levels, and appropriate replication and block-completeness to allow an anlysis to proceed. Data sets are analyzed using t-tests, one-way ANOVA, or two-way ANOVA with or without interaction, as appropriate to the design.