Display options
Share it on

Biomol Detect Quantif. 2015 Sep 01;5:30-37. doi: 10.1016/j.bdq.2015.08.003.

Control for stochastic sampling variation and qualitative sequencing error in next generation sequencing.

Biomolecular detection and quantification

Thomas Blomquist, Erin L Crawford, Jiyoun Yeo, Xiaolu Zhang, James C Willey

Affiliations

  1. Department of Pathology, University of Toledo Health Sciences Campus, Toledo, OH 43614.
  2. Department of Medicine, University of Toledo Health Sciences Campus, Toledo, OH 43614.
  3. Department of Pathology, University of Toledo Health Sciences Campus, Toledo, OH 43614 ; Department of Medicine, University of Toledo Health Sciences Campus, Toledo, OH 43614.

PMID: 26693143 PMCID: PMC4673681 DOI: 10.1016/j.bdq.2015.08.003

Abstract

BACKGROUND: Clinical implementation of Next-Generation Sequencing (NGS) is challenged by poor control for stochastic sampling, library preparation biases and qualitative sequencing error. To address these challenges we developed and tested two hypotheses.

METHODS: Hypothesis 1: Analytical variation in quantification is predicted by stochastic sampling effects at input of a) amplifiable nucleic acid target molecules into the library preparation, b) amplicons from library into sequencer, or c) both. We derived equations using Monte Carlo simulation to predict assay coefficient of variation (CV) based on these three working models and tested them against NGS data from specimens with well characterized molecule inputs and sequence counts prepared using competitive multiplex-PCR amplicon-based NGS library preparation method comprising synthetic internal standards (IS). Hypothesis 2: Frequencies of technically-derived qualitative sequencing errors (i.e., base substitution, insertion and deletion) observed at each base position in each target native template (NT) are concordant with those observed in respective competitive synthetic IS present in the same reaction. We measured error frequencies at each base position within amplicons from each of 30 target NT, then tested whether they correspond to those within the 30 respective IS.

RESULTS: For hypothesis 1, the Monte Carlo model derived from both sampling events best predicted CV and explained 74% of observed assay variance. For hypothesis 2, observed frequency and type of sequence variation at each base position within each IS was concordant with that observed in respective NTs (R

CONCLUSION: In targeted NGS, synthetic competitive IS control for stochastic sampling at input of both target into library preparation and of target library product into sequencer, and control for qualitative errors generated during library preparation and sequencing. These controls enable accurate clinical diagnostic reporting of confidence limits and limit of detection for copy number measurement, and of frequency for each actionable mutation.

References

  1. Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):20166-71 - PubMed
  2. PLoS One. 2014 Feb 21;9(2):e89395 - PubMed
  3. Nucleic Acids Res. 2011 Jul;39(12):e81 - PubMed
  4. Nat Methods. 2008 Jul;5(7):621-8 - PubMed
  5. Cancer Res. 2009 Nov 15;69(22):8629-35 - PubMed
  6. J Mol Diagn. 2014 Jan;16(1):75-88 - PubMed
  7. Proc Natl Acad Sci U S A. 2014 Feb 4;111(5):1891-6 - PubMed
  8. PLoS One. 2013 Nov 13;8(11):e79120 - PubMed
  9. Arch Pathol Lab Med. 2015 Apr;139(4):481-93 - PubMed
  10. Proc Natl Acad Sci U S A. 2012 Sep 4;109(36):14508-13 - PubMed
  11. Nat Biotechnol. 2012 Nov;30(11):1033-6 - PubMed
  12. Nat Biotechnol. 2013 Nov;31(11):1023-31 - PubMed
  13. Proc Natl Acad Sci U S A. 2011 Jun 7;108(23):9530-5 - PubMed
  14. BMC Genomics. 2014 Mar 28;15:244 - PubMed
  15. Nat Biotechnol. 2015 Jul;33(7):689-93 - PubMed
  16. Nat Biotechnol. 2013 Mar;31(3):213-9 - PubMed

Publication Types

Grant support