Front Cell Dev Biol. 2014 Nov 19;2:70. doi: 10.3389/fcell.2014.00070. eCollection 2014.
Integrative workflows for metagenomic analysis.
Frontiers in cell and developmental biology
Efthymios Ladoukakis, Fragiskos N Kolisis, Aristotelis A Chatziioannou
Affiliations
Affiliations
- Laboratory of Biotechnology, Department of Chemical Engineering, School of Chemical Engineering, National Technical University of Athens Athens, Greece.
- Metabolic Engineering and Bioinformatics Program, Institute of Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation Athens, Greece.
PMID: 25478562
PMCID: PMC4237130 DOI: 10.3389/fcell.2014.00070
Abstract
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications.
Keywords: bioinformatics; cloud computing; distributed computing; metagenomics; workflow engines
References
- Genome Biol. 2013 Jan 15;14(1):R2 - PubMed
- IEEE Trans Nanobioscience. 2010 Dec;9(4):310-6 - PubMed
- Bioinformatics. 2014 Jun 15;30(12):1660-6 - PubMed
- Conf Proc IEEE Eng Med Biol Soc. 2010;2010:6190-3 - PubMed
- Appl Environ Microbiol. 2006 Jul;72(7):5069-72 - PubMed
- PeerJ. 2014 Jan 09;2:e243 - PubMed
- Nat Methods. 2011 May;8(5):367 - PubMed
- Brief Bioinform. 2012 Nov;13(6):646-55 - PubMed
- Genome Res. 2002 Apr;12(4):656-64 - PubMed
- Bioinformatics. 2004 Nov 1;20(16):2832-3 - PubMed
- Nat Methods. 2012 Mar 04;9(4):357-9 - PubMed
- PLoS Comput Biol. 2009 Apr;5(4):e1000352 - PubMed
- Annu Rev Genomics Hum Genet. 2008;9:387-402 - PubMed
- Bioinformatics. 2012 Dec 1;28(23):3150-2 - PubMed
- Nucleic Acids Res. 2009 Nov;37(20):6643-54 - PubMed
- BMC Bioinformatics. 2009 Oct 28;10:359 - PubMed
- Genome Biol. 2010;11(5):207 - PubMed
- Nat Biotechnol. 2010 Jul;28(7):691-3 - PubMed
- Nucleic Acids Res. 2014 Jan;42(Database issue):D32-7 - PubMed
- PLoS Biol. 2007 Mar;5(3):e75 - PubMed
- BMC Bioinformatics. 2013 Jan 31;14:33 - PubMed
- Genome Res. 2008 May;18(5):821-9 - PubMed
- Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9 - PubMed
- Nat Rev Genet. 2010 Jan;11(1):31-46 - PubMed
- Bioinformatics. 2010 Dec 1;26(23):2979-80 - PubMed
- Bioinformatics. 2011 Jan 1;27(1):22-30 - PubMed
- BMC Bioinformatics. 2011 Aug 09;12:328 - PubMed
- Nucleic Acids Res. 2010 Nov;38(20):e191 - PubMed
- Nucleic Acids Res. 2009 Jul;37(Web Server issue):W101-5 - PubMed
- Stand Genomic Sci. 2011 Nov 30;5(2):248-53 - PubMed
- J Bacteriol. 1991 Jan;173(2):697-703 - PubMed
- Comput Biol Chem. 2009 Apr;33(2):121-36 - PubMed
- Nucleic Acids Res. 2003 Nov 15;31(22):6633-9 - PubMed
- Science. 2000 Mar 24;287(5461):2196-204 - PubMed
- Chem Biol. 1998 Oct;5(10):R245-9 - PubMed
- Nucleic Acids Res. 2008 Jan;36(Database issue):D534-8 - PubMed
- Mol Biol Evol. 1992 Jul;9(4):744-52 - PubMed
- Appl Environ Microbiol. 2007 Aug;73(16):5261-7 - PubMed
- Bioinformatics. 2011 Nov 1;27(21):2964-71 - PubMed
- BMC Genomics. 2008 Feb 08;9:75 - PubMed
- Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7 - PubMed
- Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37 - PubMed
- Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30 - PubMed
- DNA Res. 2008 Dec;15(6):387-96 - PubMed
- Nat Methods. 2010 May;7(5):335-6 - PubMed
- Curr Protoc Bioinformatics. 2011 Mar;Chapter 11:Unit 11.8 - PubMed
- Genome Res. 2007 Mar;17(3):377-86 - PubMed
- J Mol Biol. 1990 Oct 5;215(3):403-10 - PubMed
- Science. 2013 Nov 29;342(6162):1057-8 - PubMed
- Nucleic Acids Res. 2002 Jan 1;30(1):183-5 - PubMed
- Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4 - PubMed
- BMC Genomics. 2014;15 Suppl 1:S12 - PubMed
- BMC Bioinformatics. 2008 Sep 19;9:386 - PubMed
- Methods Mol Biol. 2000;132:243-58 - PubMed
- BMC Bioinformatics. 2011 Sep 30;12:385 - PubMed
- Bioinformatics. 2011 Mar 15;27(6):863-4 - PubMed
- Genome Res. 2002 Oct;12(10):1611-8 - PubMed
- Nucleic Acids Res. 2004 Jan 1;32(Database issue):D45-9 - PubMed
- IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):180-9 - PubMed
- PLoS One. 2012;7(2):e30619 - PubMed
- Nucleic Acids Res. 2012 Jan;40(1):e9 - PubMed
- Clin Microbiol Infect. 2008 Oct;14(10):908-34 - PubMed
- Nucleic Acids Res. 2014 Jan;42(Database issue):D568-73 - PubMed
- Nucleic Acids Res. 2001 Jan 1;29(1):173-4 - PubMed
- Bioinformatics. 2010 Dec 1;26(23):2977-8 - PubMed
- BMC Bioinformatics. 2011 Aug 30;12:356 - PubMed
- Genome Res. 2005 Oct;15(10):1451-5 - PubMed
- Nucleic Acids Res. 2014 Apr;42(8):e73 - PubMed
- Nucleic Acids Res. 2005 Oct 07;33(17):5691-702 - PubMed
- Nat Biotechnol. 2008 Oct;26(10):1135-45 - PubMed
- Bioinformatics. 2010 Oct 1;26(19):2460-1 - PubMed
- Nucleic Acids Res. 2010 Apr;38(6):1767-71 - PubMed
- Appl Environ Microbiol. 2009 Dec;75(23):7537-41 - PubMed
- Annu Rev Microbiol. 2003;57:369-94 - PubMed
- Nucleic Acids Res. 2004 Dec 01;32(21):6226-39 - PubMed
- Nucleic Acids Res. 2001 Jan 1;29(1):41-3 - PubMed
- Nucleic Acids Res. 2010 Jul;38(12):e132 - PubMed
- J Pharmacol Pharmacother. 2012 Oct;3(4):351-2 - PubMed
- Bioinformatics. 2011 Jul 1;27(13):i94-101 - PubMed
- Nucleic Acids Res. 2000 Jan 1;28(1):33-6 - PubMed
- Genome Res. 2009 Nov;19(11):2144-53 - PubMed
- Methods. 2013 Sep 1;63(1):41-9 - PubMed
- BMC Bioinformatics. 2010 Sep 27;11:485 - PubMed
- Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9 - PubMed
- Genomics. 2010 Jun;95(6):315-27 - PubMed
- PLoS Comput Biol. 2009 Jun;5(6):e1000369 - PubMed
Publication Types