Display options
Share it on

Front Cell Dev Biol. 2014 Nov 19;2:70. doi: 10.3389/fcell.2014.00070. eCollection 2014.

Integrative workflows for metagenomic analysis.

Frontiers in cell and developmental biology

Efthymios Ladoukakis, Fragiskos N Kolisis, Aristotelis A Chatziioannou

Affiliations

  1. Laboratory of Biotechnology, Department of Chemical Engineering, School of Chemical Engineering, National Technical University of Athens Athens, Greece.
  2. Metabolic Engineering and Bioinformatics Program, Institute of Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation Athens, Greece.

PMID: 25478562 PMCID: PMC4237130 DOI: 10.3389/fcell.2014.00070

Abstract

The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications.

Keywords: bioinformatics; cloud computing; distributed computing; metagenomics; workflow engines

References

  1. Genome Biol. 2013 Jan 15;14(1):R2 - PubMed
  2. IEEE Trans Nanobioscience. 2010 Dec;9(4):310-6 - PubMed
  3. Bioinformatics. 2014 Jun 15;30(12):1660-6 - PubMed
  4. Conf Proc IEEE Eng Med Biol Soc. 2010;2010:6190-3 - PubMed
  5. Appl Environ Microbiol. 2006 Jul;72(7):5069-72 - PubMed
  6. PeerJ. 2014 Jan 09;2:e243 - PubMed
  7. Nat Methods. 2011 May;8(5):367 - PubMed
  8. Brief Bioinform. 2012 Nov;13(6):646-55 - PubMed
  9. Genome Res. 2002 Apr;12(4):656-64 - PubMed
  10. Bioinformatics. 2004 Nov 1;20(16):2832-3 - PubMed
  11. Nat Methods. 2012 Mar 04;9(4):357-9 - PubMed
  12. PLoS Comput Biol. 2009 Apr;5(4):e1000352 - PubMed
  13. Annu Rev Genomics Hum Genet. 2008;9:387-402 - PubMed
  14. Bioinformatics. 2012 Dec 1;28(23):3150-2 - PubMed
  15. Nucleic Acids Res. 2009 Nov;37(20):6643-54 - PubMed
  16. BMC Bioinformatics. 2009 Oct 28;10:359 - PubMed
  17. Genome Biol. 2010;11(5):207 - PubMed
  18. Nat Biotechnol. 2010 Jul;28(7):691-3 - PubMed
  19. Nucleic Acids Res. 2014 Jan;42(Database issue):D32-7 - PubMed
  20. PLoS Biol. 2007 Mar;5(3):e75 - PubMed
  21. BMC Bioinformatics. 2013 Jan 31;14:33 - PubMed
  22. Genome Res. 2008 May;18(5):821-9 - PubMed
  23. Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9 - PubMed
  24. Nat Rev Genet. 2010 Jan;11(1):31-46 - PubMed
  25. Bioinformatics. 2010 Dec 1;26(23):2979-80 - PubMed
  26. Bioinformatics. 2011 Jan 1;27(1):22-30 - PubMed
  27. BMC Bioinformatics. 2011 Aug 09;12:328 - PubMed
  28. Nucleic Acids Res. 2010 Nov;38(20):e191 - PubMed
  29. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W101-5 - PubMed
  30. Stand Genomic Sci. 2011 Nov 30;5(2):248-53 - PubMed
  31. J Bacteriol. 1991 Jan;173(2):697-703 - PubMed
  32. Comput Biol Chem. 2009 Apr;33(2):121-36 - PubMed
  33. Nucleic Acids Res. 2003 Nov 15;31(22):6633-9 - PubMed
  34. Science. 2000 Mar 24;287(5461):2196-204 - PubMed
  35. Chem Biol. 1998 Oct;5(10):R245-9 - PubMed
  36. Nucleic Acids Res. 2008 Jan;36(Database issue):D534-8 - PubMed
  37. Mol Biol Evol. 1992 Jul;9(4):744-52 - PubMed
  38. Appl Environ Microbiol. 2007 Aug;73(16):5261-7 - PubMed
  39. Bioinformatics. 2011 Nov 1;27(21):2964-71 - PubMed
  40. BMC Genomics. 2008 Feb 08;9:75 - PubMed
  41. Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7 - PubMed
  42. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37 - PubMed
  43. Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30 - PubMed
  44. DNA Res. 2008 Dec;15(6):387-96 - PubMed
  45. Nat Methods. 2010 May;7(5):335-6 - PubMed
  46. Curr Protoc Bioinformatics. 2011 Mar;Chapter 11:Unit 11.8 - PubMed
  47. Genome Res. 2007 Mar;17(3):377-86 - PubMed
  48. J Mol Biol. 1990 Oct 5;215(3):403-10 - PubMed
  49. Science. 2013 Nov 29;342(6162):1057-8 - PubMed
  50. Nucleic Acids Res. 2002 Jan 1;30(1):183-5 - PubMed
  51. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4 - PubMed
  52. BMC Genomics. 2014;15 Suppl 1:S12 - PubMed
  53. BMC Bioinformatics. 2008 Sep 19;9:386 - PubMed
  54. Methods Mol Biol. 2000;132:243-58 - PubMed
  55. BMC Bioinformatics. 2011 Sep 30;12:385 - PubMed
  56. Bioinformatics. 2011 Mar 15;27(6):863-4 - PubMed
  57. Genome Res. 2002 Oct;12(10):1611-8 - PubMed
  58. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D45-9 - PubMed
  59. IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):180-9 - PubMed
  60. PLoS One. 2012;7(2):e30619 - PubMed
  61. Nucleic Acids Res. 2012 Jan;40(1):e9 - PubMed
  62. Clin Microbiol Infect. 2008 Oct;14(10):908-34 - PubMed
  63. Nucleic Acids Res. 2014 Jan;42(Database issue):D568-73 - PubMed
  64. Nucleic Acids Res. 2001 Jan 1;29(1):173-4 - PubMed
  65. Bioinformatics. 2010 Dec 1;26(23):2977-8 - PubMed
  66. BMC Bioinformatics. 2011 Aug 30;12:356 - PubMed
  67. Genome Res. 2005 Oct;15(10):1451-5 - PubMed
  68. Nucleic Acids Res. 2014 Apr;42(8):e73 - PubMed
  69. Nucleic Acids Res. 2005 Oct 07;33(17):5691-702 - PubMed
  70. Nat Biotechnol. 2008 Oct;26(10):1135-45 - PubMed
  71. Bioinformatics. 2010 Oct 1;26(19):2460-1 - PubMed
  72. Nucleic Acids Res. 2010 Apr;38(6):1767-71 - PubMed
  73. Appl Environ Microbiol. 2009 Dec;75(23):7537-41 - PubMed
  74. Annu Rev Microbiol. 2003;57:369-94 - PubMed
  75. Nucleic Acids Res. 2004 Dec 01;32(21):6226-39 - PubMed
  76. Nucleic Acids Res. 2001 Jan 1;29(1):41-3 - PubMed
  77. Nucleic Acids Res. 2010 Jul;38(12):e132 - PubMed
  78. J Pharmacol Pharmacother. 2012 Oct;3(4):351-2 - PubMed
  79. Bioinformatics. 2011 Jul 1;27(13):i94-101 - PubMed
  80. Nucleic Acids Res. 2000 Jan 1;28(1):33-6 - PubMed
  81. Genome Res. 2009 Nov;19(11):2144-53 - PubMed
  82. Methods. 2013 Sep 1;63(1):41-9 - PubMed
  83. BMC Bioinformatics. 2010 Sep 27;11:485 - PubMed
  84. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9 - PubMed
  85. Genomics. 2010 Jun;95(6):315-27 - PubMed
  86. PLoS Comput Biol. 2009 Jun;5(6):e1000369 - PubMed

Publication Types