The growth in popularity of RNA expression microarrays has been accompanied by concerns about the reliability of the data especially when comparing between different platforms. Here, we present an evaluation of the reproducibility of microarray results using two platforms, Affymetrix GeneChips and Illumina BeadArrays. The study design is based on a dilution series of two human tissues (blood and placenta), tested in duplicate on each platform. The results of a comparison between the platforms indicate very high agreement, particularly for genes which are predicted to be differentially expressed between the two tissues. Agreement was strongly correlated with the level of expression of a gene. Concordance was also improved when probes on the two platforms could be identified as being likely to target the same set of transcripts of a given gene. These results shed light on the causes or failures of agreement across microarray platforms. The set of probes we found to be most highly reproducible can be used by others to help increase confidence in analyses of other data sets using these platforms.
For the affymetrix data, the original raw data files (e.g., CEL files) are available on the GEO site.
For the Illumina data, the “rawest” data we have is here. This includes some extra summary statistics for each probe on each array, but not the individual bead intensity values. The individual bead values were not retained by the Illumina analysts. The “mean” values in that file corresponds to the values used as a starting point in our analysis.
The following files were used in the analyses presented in the paper:
- Affymetrix sequences used (These are the “collapsed” probe sequences)
- Illumina sequences used (50-base pair probes)
- BLAT results for Affymetrix
- BLAT results for Illumina
- Affymetrix annotations, BLAT-based
- Illumina annotations, BLAT-based
- Affymetrix data – RMA extraction.
- Illumina data – No background subtraction, quantile normalized. The unnormalized data are also available
- Combined data files for clustering
- results table – contains many of the values used to make figures. See the README for an explanation.
- results-selectedForFollowup.xls – The pairs of probes yielding “unexplained” disagreement between the platforms.
- Placenta or blood-specific genes (Affymetrix probes or Illumina probes) – List of genes taken to be placenta- or blood-specific for the purposes of evaluation.
- README.R and Functions.R – R commands used in the project. This shows exactly how most of the analyses were done. The script will not run without modification because it includes paths and some input files that are not part of the supplementary data (any additional files needed are available on request or from GEO). Functions.R contains some methods referred to in README.R