Display options
Share it on

BMC Genomics. 2013 May 25;14:349. doi: 10.1186/1471-2164-14-349.

NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

BMC genomics

Nak-Kyeong Kim, Rasika V Jayatillake, John L Spouge

Affiliations

  1. Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA. [email protected]

PMID: 23706083 PMCID: PMC3672025 DOI: 10.1186/1471-2164-14-349

Abstract

BACKGROUND: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

RESULTS: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

CONCLUSIONS: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

References

  1. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4 - PubMed
  2. Bioinformatics. 2005 Jun;21 Suppl 1:i440-8 - PubMed
  3. Nucleic Acids Res. 2008 Sep;36(16):5221-31 - PubMed
  4. Comput Appl Biosci. 1989 Apr;5(2):89-96 - PubMed
  5. Trends Biochem Sci. 1998 Nov;23(11):444-7 - PubMed
  6. BMC Bioinformatics. 2010 Jul 02;11:369 - PubMed
  7. Nat Methods. 2009 Nov;6(11 Suppl):S22-32 - PubMed
  8. Nat Methods. 2007 Aug;4(8):651-7 - PubMed
  9. Bioinformatics. 2008 Aug 1;24(15):1729-30 - PubMed
  10. Nat Methods. 2008 Sep;5(9):829-34 - PubMed
  11. Genome Biol. 2009;10(3):R25 - PubMed
  12. Nat Biotechnol. 2008 Nov;26(11):1293-300 - PubMed
  13. Nat Biotechnol. 2008 Dec;26(12):1351-9 - PubMed
  14. Nat Biotechnol. 2009 Jan;27(1):66-75 - PubMed
  15. BMC Genomics. 2009 Dec 18;10:618 - PubMed
  16. Genome Biol. 2008;9(9):R137 - PubMed
  17. Nat Methods. 2008 Jul;5(7):621-8 - PubMed
  18. BMC Genomics. 2010 Feb 10;11 Suppl 1:S12 - PubMed
  19. PLoS One. 2010 Jul 08;5(7):e11471 - PubMed
  20. Science. 2007 Jun 8;316(5830):1497-502 - PubMed

MeSH terms

Publication Types

Grant support