Simple statistics tools for gene expression arrays

Preamble: These tools perform some simple statistical analysis using Perl scripts. I stopped short of 'reinventing the wheel': there are statistical packages which can sort of handle pretty much anything you throw at them and perform much more sophisticated analyses than those I provide here. If you are looking for more sophisticated statistical tools, I recommend checking out R

Important notice:
Everything here is offered 'as is'. While I have tried to write software that works correctly, there is no warrantee with respect to the correctness of the results provided by this software or suitability for a particular purpose. The software can be freely used and distributed as long as appropriate attribution is provided (in the form of a citation).

Please cite the following paper if you use this software in your studies:

Pavlidis and Noble (2001) Analysis of strain and regional variation in gene expression in mouse brain. Genome Biology 2001;2(10):RESEARCH0042.

Latest Update information

Please send me an email if you want to be notified of updates and bug fixes as they are posted here. Otherwise, check the web site periodically for udpates. I personally use this software all the time, so I am constantly improving it and fixing things.

  • June 6 2002: Fixed bug that allowed roundoff errors resulting in 'negative pvalues' to escape.
  • November 6 2001: Improved p value calculation method improves precision of t-tests and patternmatching. This only affects situations involving p values less than about 10-14.

What to do in case of a bug or problem

Notify me and I will try to figure out what is wrong. If you have a SMALL data file that reproduces the problem, include it, and the layout file, as well as a listing of the shell session during which the error occurred.

System requirements

The software provided here is written in Perl and has only been used on Unix systems (solaris and linux).

Download

Click here to download the entire set of scripts. You'll have to refer back here for documentation.

Core documentation

These files apply to much of what is described here.

Sample files

These can be used to test some of the software.

Core libraries

  • Stats.pm: This is needed to run the scripts. (download) (docs)

Analysis of variance and t-tests

  • Read about the experiment 'layout' file format.
  • ttest: Do a two-sided ttest with or without Welch correction. Recently this program was improved to include some nonparametric tests as options (the Mann-Whitney "U" test and the rank-transformed t-test). Of course some of these tests are easy to do in Excel, but some may find this useful.(download) (docs)
  • anova-oneway: Do a one-way analysis of variance with a balanced design (download) (docs)
  • anova-twoway-norep: Do a two-way analysis of variance, when there are no replicates. (download) (docs)
  • anova-twoway-withrep: Do a two-way analysis of variance, when there are replicates with a balanced design. (download) (docs)

Pattern or template matching

See the paper describing this in more detail.(pdf) (site)