Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis.

Display options

Format

Share it on

Genom Data. 2014 Aug 14;2:268-73. doi: 10.1016/j.gdata.2014.08.002. eCollection 2014 Dec.

Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis.

Genomics data

Marco Antonio Mendoza-Parra, Hinrich Gronemeyer

Affiliations

Department of Functional Genomics and Cancer, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labellisée Ligue Contre le Cancer, Centre National de la Recherche Scientifique UMR 7104, Institut National de la Santé et de la Recherche Médicale U964, University of Strasbourg, Illkirch, France.

PMID: 26484107 PMCID: PMC4536145 DOI: 10.1016/j.gdata.2014.08.002

Abstract

Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors ('cistromes') and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA ('epigenome') or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of "omics" datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome-transcriptome-epigenome-chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this 'opinion' we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org.

Keywords: ChIP sequencing; Massive parallel sequencing; Omics data mining; Quality control

References

Science. 2000 Dec 22;290(5500):2306-9 - PubMed
Science. 2007 Jun 8;316(5830):1497-502 - PubMed
Nucleic Acids Res. 2013 Nov;41(21):e196 - PubMed
Cell. 2007 May 18;129(4):823-37 - PubMed
PLoS Comput Biol. 2013;9(11):e1003326 - PubMed
Genome Res. 2012 Sep;22(9):1813-31 - PubMed
Nature. 2007 Aug 2;448(7153):553-60 - PubMed
G3 (Bethesda). 2014 Feb 19;4(2):209-23 - PubMed
Science. 2001 Feb 16;291(5507):1304-51 - PubMed
Proc Natl Acad Sci U S A. 1984 Jul;81(14):4275-9 - PubMed
Nature. 2001 Feb 15;409(6822):860-921 - PubMed

Publication Types

Journal Article