Display options
Share it on

Front Genet. 2015 Aug 04;6:260. doi: 10.3389/fgene.2015.00260. eCollection 2015.

Correcting for the study bias associated with protein-protein interaction measurements reveals differences between protein degree distributions from different cancer types.

Frontiers in genetics

Martin H Schaefer, Luis Serrano, Miguel A Andrade-Navarro

Affiliations

  1. Systems Biology Research Unit, Centre for Genomic Regulation - European Molecular Biology Laboratory, Barcelona Spain ; Universitat Pompeu Fabra, Barcelona Spain.
  2. Systems Biology Research Unit, Centre for Genomic Regulation - European Molecular Biology Laboratory, Barcelona Spain ; Universitat Pompeu Fabra, Barcelona Spain ; Institució Catalana de Recerca i Estudis Avançats, Barcelona Spain.
  3. Faculty of Biology, Johannes Gutenberg University of Mainz Mainz, Germany ; Institute of Molecular Biology, Mainz Germany.

PMID: 26300911 PMCID: PMC4523822 DOI: 10.3389/fgene.2015.00260

Abstract

Protein-protein interaction (PPI) networks are associated with multiple types of biases partly rooted in technical limitations of the experimental techniques. Another source of bias are the different frequencies with which proteins have been studied for interaction partners. It is generally believed that proteins with a large number of interaction partners tend to be essential, evolutionarily conserved, and involved in disease. It has been repeatedly reported that proteins driving tumor formation have a higher number of PPI partners. However, it has been noticed before that the degree distribution of PPI networks is biased toward disease proteins, which tend to have been studied more often than non-disease proteins. At the same time, for many poorly characterized proteins no interactions have been reported yet. It is unclear to which extent this study bias affects the observation that cancer proteins tend to have more PPI partners. Here, we show that the degree of a protein is a function of the number of times it has been screened for interaction partners. We present a randomization-based method that controls for this bias to decide whether a group of proteins is associated with significantly more PPI partners than the proteomic background. We apply our method to cancer proteins and observe, in contrast to previous studies, no conclusive evidence for a significantly higher degree distribution associated with cancer proteins as compared to non-cancer proteins when we compare them to proteins that have been equally often studied as bait proteins. Comparing proteins from different tumor types, a more complex picture emerges in which proteins of certain cancer classes have significantly more interaction partners while others are associated with a smaller degree. For example, proteins of several hematological cancers tend to be associated with a higher number of interaction partners as expected by chance. Solid tumors, in contrast, are usually associated with a degree distribution similar to those of equally often studied random protein sets. We discuss the biological implications of these findings. Our work shows that accounting for biases in the PPI network is possible and increases the value of PPI data.

Keywords: cancer genes; degree distribution; network analysis; protein–protein interactions; study bias

References

  1. J Clin Invest. 1971 Dec;50(12):2485-97 - PubMed
  2. Nature. 2002 Jan 10;415(6868):141-7 - PubMed
  3. Bioinformatics. 2006 Nov 15;22(22):2800-5 - PubMed
  4. Trends Genet. 2008 Sep;24(9):427-30 - PubMed
  5. PLoS Comput Biol. 2013;9(1):e1002860 - PubMed
  6. PLoS Comput Biol. 2008 Aug 01;4(8):e1000140 - PubMed
  7. Proc Biol Sci. 2005 Aug 22;272(1573):1721-5 - PubMed
  8. Nucleic Acids Res. 2007 Jan;35(Database issue):D561-5 - PubMed
  9. BMC Syst Biol. 2010 Jun 07;4:80 - PubMed
  10. Nature. 2014 Jan 23;505(7484):495-501 - PubMed
  11. PLoS One. 2011 Feb 18;6(2):e17258 - PubMed
  12. Nucleic Acids Res. 2007 Jan;35(Database issue):D572-4 - PubMed
  13. Mol Syst Biol. 2012 Jan 17;8:565 - PubMed
  14. Nat Biotechnol. 2008 Jan;26(1):69-72 - PubMed
  15. J Proteomics. 2014 Apr 4;100:44-54 - PubMed
  16. Nature. 2000 Feb 3;403(6769):503-11 - PubMed
  17. Cell. 2014 Nov 20;159(5):1212-26 - PubMed
  18. Br J Haematol. 2010 Oct;151(2):192-5 - PubMed
  19. Gene. 2003 Aug 14;313:17-42 - PubMed
  20. Nature. 2005 Oct 13;437(7061):1032-7 - PubMed
  21. Nature. 2002 May 23;417(6887):399-403 - PubMed
  22. Database (Oxford). 2010 Oct 12;2010:baq023 - PubMed
  23. Bioinformatics. 2005 Dec 1;21(23):4205-8 - PubMed
  24. Bioinformatics. 2006 Sep 15;22(18):2291-7 - PubMed
  25. Science. 2008 Oct 3;322(5898):56-7 - PubMed
  26. Mol Syst Biol. 2012;8:628 - PubMed
  27. Nat Methods. 2009 Jan;6(1):83-90 - PubMed
  28. BMC Syst Biol. 2013 Jun 25;7:49 - PubMed
  29. PLoS One. 2012;7(2):e31826 - PubMed
  30. Nucleic Acids Res. 2011 Jan;39(Database issue):D712-7 - PubMed
  31. Proteomics. 2008 Nov;8(22):4657-67 - PubMed
  32. PLoS One. 2009 Jun 05;4(6):e5815 - PubMed
  33. Bioinformatics. 2007 Mar 1;23(5):605-11 - PubMed

Publication Types