Display options
Share it on

Sci Data. 2016 Sep 27;3:160081. doi: 10.1038/sdata.2016.81.

Next generation sequencing data of a defined microbial mock community.

Scientific data

Esther Singer, Bill Andreopoulos, Robert M Bowers, Janey Lee, Shweta Deshpande, Jennifer Chiniquy, Doina Ciobanu, Hans-Peter Klenk, Matthew Zane, Christopher Daum, Alicia Clum, Jan-Fang Cheng, Alex Copeland, Tanja Woyke

Affiliations

  1. DOE Joint Genome Institute, Walnut Creek, California 94598, USA.
  2. Newcastle University, Newcastle upon Tyne, NE1 7RU, UK.

PMID: 27673566 PMCID: PMC5037974 DOI: 10.1038/sdata.2016.81

Abstract

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.

Conflict of interest statement

The authors declare no competing financial interests.

References

  1. Bioinformatics. 2012 Jul 15;28(14):1823-9 - PubMed
  2. PLoS One. 2012;7(4):e34605 - PubMed
  3. Proc Natl Acad Sci U S A. 2010 Apr 20;107(16):7503-8 - PubMed
  4. PLoS One. 2010 Mar 10;5(3):e9490 - PubMed
  5. ISME J. 2016 Aug;10(8):2020-32 - PubMed
  6. Nat Methods. 2013 Oct;10(10):996-8 - PubMed
  7. Genome Biol. 2013 Jul 03;14(7):405 - PubMed
  8. Bioinformatics. 2016 Jul 15;32(14):2199-201 - PubMed
  9. Appl Environ Microbiol. 2015 Jul;81(13):4536-45 - PubMed
  10. PLoS One. 2012;7(6):e39315 - PubMed
  11. Nature. 2012 Jun 13;486(7402):207-14 - PubMed
  12. Brief Bioinform. 2012 Jan;13(1):107-21 - PubMed
  13. PLoS One. 2014 Apr 10;9(4):e94249 - PubMed
  14. Bioinformatics. 2011 Aug 15;27(16):2194-200 - PubMed
  15. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6 - PubMed
  16. Genome Res. 2011 Mar;21(3):494-504 - PubMed
  17. Appl Environ Microbiol. 2013 Sep;79(17):5112-20 - PubMed
  18. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42 - PubMed
  19. Genome Biol. 2011 Nov 08;12(11):R112 - PubMed
  20. Nat Biotechnol. 2013 Sep;31(9):814-21 - PubMed
  21. Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):14024-9 - PubMed
  22. Genome Biol. 2011;12(5):R44 - PubMed
  23. BMC Genomics. 2015 Oct 24;16:856 - PubMed
  24. Nature. 2012 Jun 13;486(7402):215-21 - PubMed

Publication Types