Display options
Share it on

Ecol Evol. 2015 Jun;5(11):2252-66. doi: 10.1002/ece3.1497. Epub 2015 May 13.

Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods.

Ecology and evolution

Jullien M Flynn, Emily A Brown, Frédéric J J Chain, Hugh J MacIsaac, Melania E Cristescu

Affiliations

  1. Department of Biology, McGill University 1205 Docteur Penfield, Stewart Biology Building, Montreal, Quebec, Canada, H3A 1B1.
  2. Department of Biology, McGill University 1205 Docteur Penfield, Stewart Biology Building, Montreal, Quebec, Canada, H3A 1B1 ; Great Lakes Institute for Environmental Research, University of Windsor Windsor, Ontario, Canada.
  3. Great Lakes Institute for Environmental Research, University of Windsor Windsor, Ontario, Canada.

PMID: 26078860 PMCID: PMC4461425 DOI: 10.1002/ece3.1497

Abstract

Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60-5068 OTUs) and three orders of magnitude for the natural community (22-22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.

Keywords: 18S rRNA; OTU; biodiversity; eDNA; high-throughput sequencing; metabarcoding

References

  1. Nat Methods. 2009 Sep;6(9):639-41 - PubMed
  2. BMC Bioinformatics. 2011 Jun 30;12:271 - PubMed
  3. Mol Biol Evol. 2005 May;22(5):1309-19 - PubMed
  4. Environ Microbiol. 2010 Jul;12(7):1889-98 - PubMed
  5. New Phytol. 2010 Oct;188(1):291-301 - PubMed
  6. Nucleic Acids Res. 2011 Aug;39(14):e95 - PubMed
  7. Mol Biol Evol. 2013 Apr;30(4):772-80 - PubMed
  8. ISME J. 2013 Feb;7(2):244-55 - PubMed
  9. Evol Bioinform Online. 2010 Sep 09;6:97-112 - PubMed
  10. Nat Methods. 2013 Oct;10(10):996-8 - PubMed
  11. Proc Biol Sci. 2003 Feb 7;270(1512):313-21 - PubMed
  12. BMC Bioinformatics. 2011 Jan 28;12:38 - PubMed
  13. Environ Microbiol. 2011 Feb;13(2):340-9 - PubMed
  14. J Mol Evol. 1999 Dec;49(6):798-805 - PubMed
  15. Zoolog Sci. 2000 Jan 1;17(1):111-21 - PubMed
  16. Bioinformatics. 2006 Jul 1;22(13):1658-9 - PubMed
  17. Brief Bioinform. 2012 Jan;13(1):107-21 - PubMed
  18. PeerJ. 2014 Sep 25;2:e593 - PubMed
  19. Genome Biol. 2007;8(7):R143 - PubMed
  20. Bioinformatics. 2011 Aug 15;27(16):2194-200 - PubMed
  21. Sci China Life Sci. 2013 Jan;56(1):73-81 - PubMed
  22. Bioinformatics. 2014 Jun 1;30(11):1530-8 - PubMed
  23. Nat Methods. 2009 Nov;6(11 Suppl):S2-5 - PubMed
  24. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6 - PubMed
  25. Mol Ecol. 2012 Apr;21(8):1931-50 - PubMed
  26. Genome Res. 2009 Apr;19(4):626-35 - PubMed
  27. Nucleic Acids Res. 2000 Dec 1;28(23):4698-708 - PubMed
  28. BMC Bioinformatics. 2010 Jan 20;11:38 - PubMed
  29. PLoS One. 2013 Sep 12;8(9):e74371 - PubMed
  30. BMC Res Notes. 2011 May 26;4:149 - PubMed
  31. Nat Commun. 2010 Oct 19;1:98 - PubMed
  32. BMC Bioinformatics. 2011 Dec 14;12:473 - PubMed
  33. PLoS One. 2013 Aug 13;8(8):e70837 - PubMed
  34. Nat Methods. 2010 May;7(5):335-6 - PubMed
  35. BMC Bioinformatics. 2010 Dec 17;11:601 - PubMed
  36. J Mol Biol. 1990 Oct 5;215(3):403-10 - PubMed
  37. J Microbiol Methods. 2013 Sep;94(3):347-55 - PubMed
  38. BMC Res Notes. 2010 Jan 11;3:3 - PubMed
  39. Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5 - PubMed
  40. PLoS Comput Biol. 2010 Jul 08;6(7):e1000844 - PubMed
  41. PLoS One. 2012;7(1):e30230 - PubMed
  42. Bioinformatics. 2011 Mar 1;27(5):611-8 - PubMed
  43. Mol Biol Evol. 1998 Nov;15(11):1430-46 - PubMed
  44. Environ Toxicol Chem. 2014 Feb;33(2):359-69 - PubMed
  45. BMC Bioinformatics. 2013 Feb 07;14:43 - PubMed
  46. Nucleic Acids Res. 2009 Jun;37(10):e76 - PubMed
  47. Trends Ecol Evol. 2012 Apr;27(4):233-43 - PubMed
  48. Bioinformatics. 2010 Oct 1;26(19):2460-1 - PubMed
  49. Bioinformatics. 2012 Nov 15;28(22):2891-7 - PubMed
  50. Appl Environ Microbiol. 2009 Dec;75(23):7537-41 - PubMed
  51. Mol Ecol Resour. 2014 Nov;14(6):1129-40 - PubMed
  52. Environ Microbiol. 2010 Jan;12(1):118-23 - PubMed
  53. J Mol Evol. 1995 Jun;40(6):629-39 - PubMed

Publication Types