Display options
Share it on

Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013.

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.

Genome medicine

Jason O'Rawe, Tao Jiang, Guangqing Sun, Yiyang Wu, Wei Wang, Jingchu Hu, Paul Bodily, Lifeng Tian, Hakon Hakonarson, W Evan Johnson, Zhi Wei, Kai Wang, Gholson J Lyon

Affiliations

  1. Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA.
  2. BGI-Shenzhen, Shenzhen 518000, China.
  3. New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA.
  4. Brigham Young University, N University Ave, Provo, 84606, USA.
  5. Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA.
  6. Boston University School of Medicine, E Concord St, Boston, 02118, USA.
  7. University of Southern California, 1501 San Pablo Street, Los Angeles, 90089, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA.
  8. Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA.

PMID: 23537139 PMCID: PMC3706896 DOI: 10.1186/gm432

Abstract

BACKGROUND: To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.

METHODS: We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.

RESULTS: SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.

CONCLUSIONS: Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.

References

  1. Nature. 2012 Apr 04;485(7397):246-50 - PubMed
  2. Am J Hum Genet. 2007 Nov;81(5):1084-97 - PubMed
  3. Nature. 2012 Jul 11;487(7406):190-5 - PubMed
  4. Nat Biotechnol. 2012 Mar 07;30(3):226-9 - PubMed
  5. Bioinformatics. 2010 Mar 15;26(6):722-9 - PubMed
  6. Neuron. 2012 Apr 26;74(2):285-99 - PubMed
  7. Nature. 2012 Apr 04;485(7397):237-41 - PubMed
  8. G3 (Bethesda). 2011 Nov;1(6):457-70 - PubMed
  9. Bioinformatics. 2009 Jul 15;25(14):1754-60 - PubMed
  10. Nature. 2012 Apr 04;485(7397):242-5 - PubMed
  11. PLoS Genet. 2012;8(4):e1002635 - PubMed
  12. Science. 2012 Jul 6;337(6090):100-4 - PubMed
  13. Nature. 2012 Nov 1;491(7422):56-65 - PubMed
  14. Nat Genet. 2011 Jun 12;43(7):712-4 - PubMed
  15. Nature. 2009 Sep 10;461(7261):272-6 - PubMed
  16. Nat Genet. 2012 Jul 22;44(8):955-9 - PubMed
  17. Genome Res. 2010 Sep;20(9):1297-303 - PubMed
  18. Am J Hum Genet. 2012 Aug 10;91(2):238-51 - PubMed
  19. Genet Med. 2011 Mar;13(3):188-90 - PubMed
  20. Hum Mol Genet. 2010 Oct 15;19(R2):R131-6 - PubMed
  21. J Comput Biol. 2012 Mar;19(3):279-92 - PubMed
  22. Bioinformatics. 2009 Aug 1;25(15):1966-7 - PubMed
  23. PLoS Genet. 2011 Sep;7(9):e1002280 - PubMed
  24. PLoS One. 2012;7(7):e40294 - PubMed
  25. Nat Rev Genet. 2011 Jun;12(6):443-51 - PubMed
  26. Nucleic Acids Res. 2012 Mar;40(6):2426-31 - PubMed
  27. Genome Res. 2006 Sep;16(9):1182-90 - PubMed
  28. Nat Genet. 2012 Dec;44(12):1365-9 - PubMed
  29. Genomics. 1995 Mar 20;26(2):345-53 - PubMed
  30. Genome Med. 2012 Jul 26;4(7):58 - PubMed
  31. Nucleic Acids Res. 2010 Sep;38(16):e164 - PubMed
  32. Annu Rev Genomics Hum Genet. 2012;13:1-27 - PubMed
  33. Discov Med. 2011 Jul;12(62):41-55 - PubMed
  34. Nat Genet. 2011 May;43(5):491-8 - PubMed
  35. Bioinformatics. 2008 Mar 1;24(5):713-4 - PubMed
  36. Genome Res. 2011 Jun;21(6):940-51 - PubMed
  37. Bioinformatics. 2010 Jan 1;26(1):38-45 - PubMed
  38. Bioinformatics. 2010 Mar 15;26(6):730-6 - PubMed
  39. Hum Genomics. 2010 Apr;4(4):271-7 - PubMed
  40. Genome Res. 2010 Feb;20(2):265-72 - PubMed
  41. Genome Res. 2011 Jun;21(6):830-9 - PubMed
  42. Genome Res. 2009 Jun;19(6):1124-32 - PubMed
  43. Nature. 2011 Feb 3;470(7332):59-65 - PubMed
  44. Science. 2010 Jan 1;327(5961):78-81 - PubMed
  45. Am J Hum Genet. 2011 Feb 11;88(2):173-82 - PubMed
  46. Science. 2010 Apr 30;328(5978):636-9 - PubMed
  47. Nucleic Acids Res. 2011 Oct;39(19):e132 - PubMed
  48. Brief Bioinform. 2013 Jan;14(1):46-55 - PubMed
  49. Nat Genet. 2010 Sep;42(9):790-3 - PubMed
  50. Nat Biotechnol. 2011 Dec 18;30(1):78-82 - PubMed
  51. Bioinformatics. 2009 Aug 15;25(16):2078-9 - PubMed
  52. Science. 2012 Jul 6;337(6090):64-9 - PubMed
  53. Nat Genet. 2010 Jan;42(1):30-5 - PubMed
  54. Proc Natl Acad Sci U S A. 2012 Jul 24;109(30):11920-7 - PubMed
  55. Nat Biotechnol. 2011 Dec 18;30(1):61-8 - PubMed
  56. Am J Hum Genet. 2011 Jul 15;89(1):28-43 - PubMed
  57. Bioinformatics. 2012 Aug 15;28(16):2097-105 - PubMed

Publication Types

Grant support