Display options
Share it on

Front Genet. 2014 May 06;5:111. doi: 10.3389/fgene.2014.00111. eCollection 2014.

Quality control of next-generation sequencing data without a reference.

Frontiers in genetics

Urmi H Trivedi, Timothée Cézard, Stephen Bridgett, Anna Montazam, Jenna Nichols, Mark Blaxter, Karim Gharbi

Affiliations

  1. Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK.
  2. Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK ; Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK.

PMID: 24834071 PMCID: PMC4018527 DOI: 10.3389/fgene.2014.00111

Abstract

Next-generation sequencing (NGS) technologies have dramatically expanded the breadth of genomics. Genome-scale data, once restricted to a small number of biomedical model organisms, can now be generated for virtually any species at remarkable speed and low cost. Yet non-model organisms often lack a suitable reference to map sequence reads against, making alignment-based quality control (QC) of NGS data more challenging than cases where a well-assembled genome is already available. Here we show that by generating a rapid, non-optimized draft assembly of raw reads, it is possible to obtain reliable and informative QC metrics, thus removing the need for a high quality reference. We use benchmark datasets generated from control samples across a range of genome sizes to illustrate that QC inferences made using draft assemblies are broadly equivalent to those made using a well-established reference, and describe QC tools routinely used in our production facility to assess the quality of NGS data from non-model organisms.

Keywords: Illumina sequencing; PCR duplicates; de novo assembly; insert size; mate pair; quality control

References

  1. Bioinformatics. 2009 Jul 15;25(14):1754-60 - PubMed
  2. Nucleic Acids Res. 2012 Jul;40(Web Server issue):W622-7 - PubMed
  3. Biology (Basel). 2012 Sep 18;1(2):439-59 - PubMed
  4. BMC Bioinformatics. 2012 Jul 30;13:185 - PubMed
  5. Nature. 2010 Jan 21;463(7279):311-7 - PubMed
  6. Bioinformatics. 2014 May 1;30(9):1228-35 - PubMed
  7. PLoS One. 2013;8(4):e60234 - PubMed
  8. J Appl Genet. 2011 Nov;52(4):413-35 - PubMed
  9. Front Genet. 2013 Nov 29;4:237 - PubMed
  10. Bioinformatics. 2012 Dec 1;28(23):3150-2 - PubMed
  11. PLoS Comput Biol. 2012;8(6):e1002541 - PubMed
  12. Genome Res. 2012 Mar;22(3):549-56 - PubMed
  13. Gigascience. 2012 Dec 27;1(1):18 - PubMed
  14. PLoS One. 2012;7(2):e30619 - PubMed
  15. Bioinformatics. 2010 Mar 1;26(5):589-95 - PubMed
  16. PLoS One. 2010 Sep 22;5(9):e12681 - PubMed
  17. Bioinformatics. 2014 Jan 1;30(1):31-7 - PubMed
  18. Symbiosis. 2011 Nov;55(3):119-126 - PubMed
  19. PLoS One. 2011 Mar 09;6(3):e17288 - PubMed
  20. Genome Res. 2012 Mar;22(3):557-67 - PubMed
  21. Bioinformatics. 2011 Mar 15;27(6):863-4 - PubMed
  22. Bioinformatics. 2014 Jun 15;30(12):1660-6 - PubMed
  23. J Mol Biol. 1990 Oct 5;215(3):403-10 - PubMed

Publication Types

Grant support