Display options
Share it on

Microbiome. 2015 May 05;3:19. doi: 10.1186/s40168-015-0083-8. eCollection 2015.

Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data.

Microbiome

Jake Jervis-Bardy, Lex E X Leong, Shashikanth Marri, Renee J Smith, Jocelyn M Choo, Heidi C Smith-Vaughan, Elizabeth Nosworthy, Peter S Morris, Stephen O'Leary, Geraint B Rogers, Robyn L Marsh

Affiliations

  1. Menzies School of Health Research, Child Health Division, Charles Darwin University, Darwin, NT Australia ; School of Medicine, Flinders University, Bedford Park, Adelaide, SA Australia ; Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia.
  2. Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia.
  3. School of Medicine, Flinders University, Bedford Park, Adelaide, SA Australia.
  4. Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia ; School of Biological Sciences, Flinders University, Adelaide, South Australia 5001 Australia.
  5. Menzies School of Health Research, Child Health Division, Charles Darwin University, Darwin, NT Australia.
  6. Department of Otolaryngology, University of Melbourne, Melbourne, VIC Australia.
  7. School of Medicine, Flinders University, Bedford Park, Adelaide, SA Australia ; Infection and Immunity Theme, South Australia Health and Medical Research Institute, North Terrace, Adelaide, SA Australia.

PMID: 25969736 PMCID: PMC4428251 DOI: 10.1186/s40168-015-0083-8

Abstract

BACKGROUND: The rapid expansion of 16S rRNA gene sequencing in challenging clinical contexts has resulted in a growing body of literature of variable quality. To a large extent, this is due to a failure to address spurious signal that is characteristic of samples with low levels of bacteria and high levels of non-bacterial DNA. We have developed a workflow based on the paired-end read Illumina MiSeq-based approach, which enables significant improvement in data quality, post-sequencing. We demonstrate the efficacy of this methodology through its application to paediatric upper-respiratory samples from several anatomical sites.

RESULTS: A workflow for processing sequence data was developed based on commonly available tools. Data generated from different sample types showed a marked variation in levels of non-bacterial signal and 'contaminant' bacterial reads. Significant differences in the ability of reference databases to accurately assign identity to operational taxonomic units (OTU) were observed. Three OTU-picking strategies were trialled as follows: de novo, open-reference and closed-reference, with open-reference performing substantially better. Relative abundance of OTUs identified as potential reagent contamination showed a strong inverse correlation with amplicon concentration allowing their objective removal. The removal of the spurious signal showed the greatest improvement in sample types typically containing low levels of bacteria and high levels of human DNA. A substantial impact of pre-filtering data and spurious signal removal was demonstrated by principal coordinate and co-occurrence analysis. For example, analysis of taxon co-occurrence in adenoid swab and middle ear fluid samples indicated that failure to remove the spurious signal resulted in the inclusion of six out of eleven bacterial genera that accounted for 80% of similarity between the sample types.

CONCLUSIONS: The application of the presented workflow to a set of challenging clinical samples demonstrates its utility in removing the spurious signal from the dataset, allowing clinical insight to be derived from what would otherwise be highly misleading output. While other approaches could potentially achieve similar improvements, the methodology employed here represents an accessible means to exclude the signal from contamination and other artefacts.

Keywords: 16S rRNA; Contamination; MiSeq; Otitis media; Pair-end reads; QIIME; Respiratory

References

  1. PLoS One. 2012;7(4):e34605 - PubMed
  2. Methods Enzymol. 2013;531:371-444 - PubMed
  3. Bioinformatics. 2014 Mar 1;30(5):614-20 - PubMed
  4. PLoS One. 2011 Mar 09;6(3):e17288 - PubMed
  5. ISME J. 2013 Feb;7(2):312-24 - PubMed
  6. Microbiome. 2013 Jul 01;1(1):19 - PubMed
  7. Genome Res. 2003 Nov;13(11):2498-504 - PubMed
  8. BMC Biol. 2014 Nov 12;12:87 - PubMed
  9. Am J Respir Crit Care Med. 2013 Nov 15;188(10):1193-201 - PubMed
  10. PeerJ. 2014 Aug 21;2:e545 - PubMed
  11. Bioinformatics. 2011 Aug 15;27(16):2194-200 - PubMed
  12. Front Cell Infect Microbiol. 2014 May 23;4:65 - PubMed
  13. PLoS One. 2012;7(3):e32942 - PubMed
  14. Sci Rep. 2014 Nov 07;4:6957 - PubMed
  15. Thorax. 2013 Dec;68(12):1150-6 - PubMed
  16. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6 - PubMed
  17. Pediatr Infect Dis J. 1994 Nov;13(11):983-9 - PubMed
  18. Nat Methods. 2010 May;7(5):335-6 - PubMed
  19. BMC Ear Nose Throat Disord. 2012 Oct 03;12:11 - PubMed
  20. Bioinformatics. 2010 Oct 1;26(19):2460-1 - PubMed
  21. Genome Biol. 2014 Dec 17;15(12):564 - PubMed
  22. Appl Environ Microbiol. 2009 Dec;75(23):7537-41 - PubMed
  23. Genome Biol. 2014 Mar 03;15(3):R46 - PubMed
  24. ISME J. 2012 Mar;6(3):610-8 - PubMed

Publication Types