Guilt by association in gene networks

Supplement to “Guilt by association” is the exception rather than the rule in gene networks.
Gillis J, Pavlidis P pubmed reprint


Hello, you’ve reached the “critical connections” project page. Helpful lists and code follow below. If you need further assistance with implementation or discussion, please contact Paul (, and we’ll be happy to help out.


Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle, guilt by association, states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this, and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks.
Structured data is supplied in matlab format data below, followed by text files. Please contact us for any additional details or other formatted data.


The article could be found at PLoS Computational Biology (open access)

See also

Multifunctionality drives gene characterization: A re-evaluation of hubs and promiscuity in gene function prediction



Gene lists:
These are the gene lists (UCSC Golden Path for Human, NCBI for yeast, MGI for mouse) used in the paper with gene symbols, NCBI ids, and Gemma gene IDs (useful for accessing Gemma webservices).


Human list text
Mouse list (Mousefunc) text
Yeast list text


Gene Ontology matrices (with descriptions, IDs, etc):


Yeast data text
Mouse data (Mousefunc) text
Human data text


Text files are gene names, accession ID pairs.


Exceptional Edges matrices:

These are the interaction matrices, with each gene replaced by a score indicating its agregate exceptionality.  The underlying data (which gives criticality for each GO group, as opposed to an aggregate score) is some gigabytes large.  The exceptionality matrices are sufficient to replicate almost all the results of the paper (but, unfortunately, doesn’t let one examine the criticality of individual GO subnetworks).


Yeast exceptional ppin text
Human exceptional ppin text


Text files are interactions pairs ordered by exceptionality.


Cross-validation calculation:


Neighbour voting cross-validation (ROC)
Neighbour voting cross-validation (Precision-recall style b)
Neighbour voting cross-validation (Precision-recall style c)