Display options
Share it on

BMC Bioinformatics. 2018 Sep 06;19(1):313. doi: 10.1186/s12859-018-2256-5.

Effective normalization for copy number variation in Hi-C data.

BMC bioinformatics

Nicolas Servant, Nelle Varoquaux, Edith Heard, Emmanuel Barillot, Jean-Philippe Vert

Affiliations

  1. Institut Curie, PSL Research University, Paris, F-75005, France. [email protected].
  2. INSERM, U900, Paris, F-75005, France. [email protected].
  3. Mines ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, F-75006, France. [email protected].
  4. Department of Statistics, University of California, Berkeley, USA.
  5. Berkeley Institute for Data Science, Berkeley, USA.
  6. Institut Curie, PSL Research University, CNRS UMR3215, INSERM U934, Paris, F-75005, France.
  7. Institut Curie, PSL Research University, Paris, F-75005, France.
  8. INSERM, U900, Paris, F-75005, France.
  9. Mines ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, F-75006, France.
  10. Ecole Normale Supérieure, PSL Research University, Department of Mathematics and Applications, Paris, F-75005, France.

PMID: 30189838 PMCID: PMC6127909 DOI: 10.1186/s12859-018-2256-5

Abstract

BACKGROUND: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other.

RESULTS: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes.

CONCLUSIONS: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.

Keywords: Cancer; Copy-number; Hi-C; Normalization

References

  1. Curr Opin Genet Dev. 2016 Feb;36:34-40 - PubMed
  2. Nat Rev Genet. 2013 Nov;14(11):765-80 - PubMed
  3. Nat Commun. 2017 Dec 5;8(1):1937 - PubMed
  4. Cell. 2014 Dec 18;159(7):1665-80 - PubMed
  5. Genome Biol. 2015 Sep 02;16:183 - PubMed
  6. Bioinformatics. 2016 Dec 15;32(24):3695-3701 - PubMed
  7. Bioinformatics. 2012 Feb 1;28(3):423-5 - PubMed
  8. Nat Methods. 2012 Oct;9(10):999-1003 - PubMed
  9. Nature. 2016 Jan 7;529(7584):110-4 - PubMed
  10. Biostatistics. 2011 Jul;12(3):413-28 - PubMed
  11. Trends Genet. 2016 Apr;32(4):225-237 - PubMed
  12. Science. 2016 Mar 25;351(6280):1454-1458 - PubMed
  13. Nat Rev Cancer. 2014 Jun;14(6):389-93 - PubMed
  14. Nature. 2012 Apr 11;485(7398):376-80 - PubMed
  15. Nature. 2012 Sep 6;489(7414):57-74 - PubMed
  16. Bioinformatics. 2012 Dec 1;28(23):3131-3 - PubMed
  17. Genome Res. 2016 Jun;26(6):719-31 - PubMed
  18. Genes Dev. 2014 Oct 1;28(19):2151-62 - PubMed
  19. Nat Genet. 2017 Jan;49(1):65-74 - PubMed
  20. Genome Biol. 2015 Sep 28;16:214 - PubMed
  21. Genomics Proteomics Bioinformatics. 2016 Feb;14(1):7-20 - PubMed
  22. Genome Res. 2014 Sep;24(9):1421-32 - PubMed
  23. Genome Biol. 2015 Aug 10;16:154 - PubMed
  24. Bioinformatics. 2004 Dec 12;20(18):3413-22 - PubMed
  25. Nature. 2015 Feb 19;518(7539):331-6 - PubMed
  26. Nat Genet. 2016 Dec 28;49(1):5-6 - PubMed
  27. Genome Biol. 2015 Dec 01;16:259 - PubMed
  28. Science. 2013 Mar 29;339(6127):1546-58 - PubMed
  29. Cell. 2014 Apr 10;157(2):369-381 - PubMed
  30. Nat Genet. 2011 Oct 16;43(11):1059-65 - PubMed
  31. Nature. 2012 Apr 11;485(7398):381-5 - PubMed
  32. Science. 2009 Oct 9;326(5950):289-93 - PubMed
  33. Nat Genet. 2013 Oct;45(10):1127-33 - PubMed
  34. BMC Genomics. 2012 Aug 30;13:436 - PubMed
  35. Nature. 2016 Oct 13;538(7624):265-269 - PubMed
  36. Genome Biol. 2017 Jun 27;18(1):125 - PubMed

MeSH terms

Publication Types

Grant support