Mouse brain data analysis

This web site is a supplement to the paper "Statistical analysis of strain and regional variation in gene expression in mouse brain" by Paul Pavlidis and William Stafford Noble, published in Genome Biology. The pdf version of the paper is available from the publisher.

The primary purpose of this site is to provide the complete data from the work in an interactive format, and raw results files for use by other researchers. Some additional data is also provided here. Go to:

Software availability

A working version of the ANOVA and template-matching software is now availble here.

The tools are Perl scripts which run under UNIX. Potential users should be aware that the scripts that perform ANOVA do not handle many complex situations which can arise. Currently only t-tests, two-way (with and without replication) and one-way ANOVA are supported.

In addition, the software we used to make the figures, matrix2png, is also available for download.

Abstract

Background

We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previously published analysis of this data, 24 genes showing expression differences between the strains were identified, while about 240 genes were found to show regional differences in expression. Like many gene expression studies, the previous analysis relied primarily on ad hoc "fold change" and "absent/present" criteria to select genes. To determine whether statistically-motivated methods would permit a more sensitive and selective analysis of gene expression patterns in the brain, we used analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region- dependent patterns of expression.

Results:

Our analysis reveals a large number of new candidate genes for involvement in behavioral differences between the two mouse strains and functional differences among the six brain regions. Using conservative statistical criteria, we identified at least 63 genes showing strain variation and approximately 600 genes showing regional variation. Unlike the ad hoc methods, our methods have the additional benefit of ranking the genes by statistical score, permitting further analysis to focus on the most significant genes. A comparison of our results to the previous studies and to published reports on individual genes show that we achieved high sensitivity while preserving selectivity.

Conclusions:

Our results indicate that the molecular differences between the strains and regions studied are larger than originally indicated by the previous studies. We also conclude that for large, complex data sets, ANOVA and feature selection, alone or in combination, are more powerful tools than methods based on "fold change" thresholds and other ad hoc gene selection criteria.

Latest updates
(4/2006)
  • Updated broken links.
  • Provided link to data file
(8/24/2002)
  • Fixed a number of problems with links, paths, and scripts due to moving the site to the new server. Thanks to the users who brought these problems to our attention.
(9/2/2001)
  • Updated the site in preparation for publication.
(6/2/2001)
  • Updated the site in preparation for ms submission.

References

Sandberg et al. Regional and strain-specific gene expression mapping in the adult mouse brain. (pdf) The data were formerly available by FTP via ftp://ftp.gnf.org/pub/papers/brainstrain/ but have since been removed or relocated.
Sandberg. Gene Expression profiling of brain regions in inbred mouse strains reveals candidate genes for phenotypic variation. Masters thesis, Karolinska institute, Stockholm (2000) (pdf).