Assessing the Gene Ontology


Motivation: The Gene Ontology (GO) is heavily used in systems biology but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored.

Results: We report that GO annotations are stable over short periods with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their .functional identity. over time, with 20% of genes not matching to themselves (by semantic similarity) after two years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally risen in humans. Finally, we discovered that many entries in protein interaction databases are due to the same published reports that are used for GO annotations with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks.


Contact: paul[at] or JGillis[at] for assistance with the data.

Data files

The following files for human genes are intended to assist researchers who wish to check their own data for the types of effects we report in the paper. The files are tab-delimited. Genes are referenced by NCBI IDs or official symbols, and publications by PubMed IDs.

  • HIPPIE PPIN – The protein interaction data used in sections 3.3 and 3.4.
  • frac_confound_aved.txt – The connection-level data plotted in figure 5A.
  • frac_confound_go_aved – The GO-term-level data plotted in figure 5A.
  • frac_confound_go_103 – Each GO group’s confoundedness for our final data point for GO. These data are plotted in Figure 3A. “NaN” occurs where there was division by zero.
  • frac_confound_con_103.txt – Number of functions shared by gene pairs from the PPIN, and the number of functions confounded for our final data point for GO (edition 103). These data are plotted in Figure 3B.
  • frac_confound_con_aved.txt – The connection-level data plotted in figure 5A.
  • frac_confound_GO_aved.txt – The GO-term-level data plotted in figure 5A.
  • Confound table List of GO IDs and Pubmed IDs of papers contributing the most confound edges for those functions
  • Semantic stability table List of genes and number of GO editions since they changed their functional identity (measured as the highest semantic similarity with itself)
  • Semantic similarity table Similarity ranking for each gene back through each edition of GO. A value of “1” means the gene was “most similar to itself” or tied for first.
  • Multifunctionality rankings table List of gene multifunctionality rankings over time. Useful if there’s interest to reduce the annotation bias in GO

Use case: The postsynaptic proteome

The following two data files were used in the analysis described in section 3.4 of the manuscript.

MISSION(TM) shRNA libraries


The Centre for High-Throughput Biology at UBC has recently acquired Sigma’s MISSION shRNA human and mouse whole-genome TRC shRNA libraries that consist of almost 200,000 pre-cloned shRNA vectors targeting more than 22,000 human and 20,000 mouse genes. The MISSION shRNA human and mouse libraries provided by Sigma Life Science and The RNAi Consortium (TRC) were developed at the Broad Institute, a joint venture between MIT and Harvard, and represent the most comprehensive and thoroughly validated shRNA collection available.


To learn more, click here.

NSERC funds High-Dimensional Bioinformatics graduate training program


NSERC has awarded a CREATE grant to Centre for High-Throughput Biology’s professor, Dr. Paul Pavlidis, to fund a High-Dimensional Bioinformatics graduate training program.  The Collaborative Research and Training Experience (CREATE) program is designed to improve the mentoring and training environment for graduate students to prepare them for the academic, industrial and government jobs that they will hold once they have received their graduate degree.


Technological innovation has given scientists unprecedented ability to gather and analyze quantities of data. Experts who can organize, visualize, analyze and interpret these large data sets are in high demand in industry and academia; this requires skills in data mining, data modelling, statistical hypothesis testing as well as multidisciplinary training in genetics and genomics. The HDB program is a partnership between UBC and Simon Fraser University and will provide training in collaborative research, professional skills and entrepreneurship to graduates.


“We’re thrilled NSERC is supporting this new bioinformatics graduate training program. One of the most exciting features is our internship program where students will spend a summer outside their primary research lab, working with data-generating groups in academia or industry. This experience will help our trainees come out of the program better prepared for the challenges and opportunities of being a bioinformatician in the real world,” said Dr. Pavlidis.


See the announcement at UBC News here.


Withers Lab in the news


“Hang a unit of O-neg, stat!”  


Words that are heard time and again on every medical drama on TV. Radio ads from Canadian Blood Services like to remind us that it can take 50 units of blood to save a car crash victim, but O-negative blood (which can be given to patients of all blood types) is in short supply and some blood types are very rare. What do you do when a patient needs a blood transfusion but you haven’t had time to type and match their blood, or you don’t have their blood type in the blood bank? It’s a problem that scientists have been working on for years but haven’t been able to find a cost-effective solution for – until now.


The Withers Lab at the Centre for High-Throughput Biology, in collaboration with scientists in the Centre for Blood Research at UBC, has created an enzyme that could potentially solve this problem. The enzyme works by snipping off the antigens found in Type A and Type B blood, making it more like Type O. This research was just published in Journal of the American Chemical Society


To read more about this study, please read these stories:  UBC News and American Society for Biochemistry and Molecular Biology

NEUROCARTA – Gene & Phenotype Database Now Available

A massive database linking genes to phenotypes has been developed by Dr. Paul Pavlidis and is now available to the scientific community.  The tool is called Neurocarta and it consolidates information on genes and phenotypes across multiple resource platforms and allows tracking and exploration of their associations.


Phenotypes are recorded using controlled vocabularies to facilitate computational inference and linking to external data sources. The associations between genes and phenotypes are filtered by stringent criteria to focus on the annotations most likely to be relevant. Researchers can also enter their own annotations. Neurocarta currently holds over 30,000 lines of evidence linking over 7,000 genes to 2,000 different phenotypes.


This brand new resource supplants the limited research information that was previously available. There was only a small amount of data linking genes to phenotypes available through public resources and, when it was available, it was scattered across multiple access tools. Neurocarta’s in-depth annotation of neurodevelopmental disorders makes it a unique resource for neuroscientists working on brain development.


To learn more about Neurocarta visit

Terry Fox Foundation Provides $13.4 Million for World-Class, Novel Research into Fighting Cancer with Viruses

The Terry Fox Foundation provides $13.4 million for world-class, novel research into fighting cancer with viruses and finding ways to treat acute leukemias.
OTTAWA, ON — They are conducting breakthrough research into fighting cancer with viruses and investigating ways to treat acute leukemias and today two world-class, long-standing and prestigious national cancer teams received a combined $13.4 million shot in the arm from The Terry Fox Foundation (TFF) to continue their work. The funds are raised annually by TFF through its annual Terry Fox community and school runs and invested through its national research arm, The Terry Fox Research Institute (TFRI).
Continue reading “Terry Fox Foundation Provides $13.4 Million for World-Class, Novel Research into Fighting Cancer with Viruses”

Stephen Withers Elected a Fellow of the Royal Society of London

Congratulations to Professor Stephen Withers, who has been elected a Fellow of the Royal Society (London). The honour recognizes Dr. Withers’ contributions to our understanding of the reaction mechanism of enzymes.


Dr. Withers is one of the world’s foremost experts on the enzymes that assemble and degrade the carbohydrates that perform important biological functions such as storing energy and providing structural support for biological cell walls. The impact of his work extends across many disciplines. His research has opened new doors for treating various diseases and introduced new research techniques. Among his achievements is an inhibitor with the potential to prevent the influenza virus from using its own enzymes to spread through the body. Improved treatments for diabetes and other diseases are also possible as well as new ways to synthesize complex sugars using engineered enzymes.


Dr. Withers completed his BSc and PhD at the University of Bristol and postdoctoral work at the University of Alberta and has been at UBC since 1982. The Royal Society of London is a self-governing Fellowship made up of world-renowned scientists, engineers and technologists from the UK and the Commonwealth. Fellows are elected for life through a peer review process on the basis of excellence in science.

Canadian Neuroinformatics and Computational Neuroscience Launches New Website

Canadian Neuroinformatics and Computational Neuroscience (CNCN) would like to introduce its new website,


A portal for communication aimed at enhancing the visibility and impact of Neuroinformatics and Computational Neuroscience in Canada, the website includes a public directory to facilitate communication between researchers and trainees. Sign up for the mailing list if you are interested in updates on the CNCN’s activities.


CNCN welcomes feedback and suggestions such as resources, potential institutional partners, and training programs that you would like them to publicize. Use the contact form to submit your suggestions.

NSERC Funds New Genome Science + Technology Graduate Programs at UBC

UBC has received generous support from NSERC’s CREATE Training Program to fund two new graduate programs in Genome Science + Technology. Details on the NSERC CREATE program can be found on the NSERC web site.


Prospective students, please visit the new Genome Science + Technology web site.

Pavlidis Lab

Pavlidis Lab: Research Interests
My research lies at the intersection of bioinformatics and neuroscience. I have a particular interest in neuropsychiatric disorders such as schizophrenia and autism, and how they affect the function of chemical synapses. A current focus of work in my lab involves the large-scale or meta-analysis of functional genomics data (e.g. microarrays). We use these approaches to study gene networks and their involvement in human neuropsychiatric diseases. To this end we collaborate closely with many laboratory-based neuroscience researchers from UBC and elsewhere. A newer area of interest is in the analysis of neuroanatomical data. Using text mining as well as existing data sources, we are engaged in the analysis of brain structure as it relates to gene expression and the brain “connectome”.
Continue reading “Pavlidis Lab”