Multifunctionality drives gene characterization: A re-evaluation of hubs and promiscuity in gene function prediction

Supplement to The impact of multifunctional genes on “guilt by association” analysis.
Gillis, J and Pavlidis, P. pubmed
PLoS One 2011 Feb 18;6(2):e17258

Hello, you’ve reached the “multifunctionality bias” project page. Helpful lists and code follow below. If you have a different metric, performance test, dataset or analysis type you’d like to see tried out or need help implementing, please contact Paul (paul@msl.ubc.ca) and we’ll be happy to help out. The findings of the paper are very consistent so that performance tests which seem to ideally suit data (a common choice) typically show correspondingly high prevalence bias and often the control cases allow the same – very specific – conclusions as the real data.

 

Abstract

Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how the multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.

 

See also

“Guilt by association” is the exception rather than the rule in gene networks

 

Supplemental Materials

 

Contact

paul@msl.ubc.ca