Display options
Share it on

Front Microbiol. 2021 Feb 22;12:635781. doi: 10.3389/fmicb.2021.635781. eCollection 2021.

Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions.

Frontiers in microbiology

Isabel Moreno-Indias, Leo Lahti, Miroslava Nedyalkova, Ilze Elbere, Gennady Roshchupkin, Muhamed Adilovic, Onder Aydemir, Burcu Bakir-Gungor, Enrique Carrillo-de Santa Pau, Domenica D'Elia, Mahesh S Desai, Laurent Falquet, Aycan Gundogdu, Karel Hron, Thomas Klammsteiner, Marta B Lopes, Laura Judith Marcos-Zambrano, Cláudia Marques, Michael Mason, Patrick May, Lejla Pašić, Gianvito Pio, Sándor Pongor, Vasilis J Promponas, Piotr Przymus, Julio Saez-Rodriguez, Alexia Sampri, Rajesh Shigdel, Blaz Stres, Ramona Suharoschi, Jaak Truu, Ciprian-Octavian Truică, Baiba Vilne, Dimitrios Vlachakis, Ercument Yilmaz, Georg Zeller, Aldert L Zomer, David Gómez-Cabrero, Marcus J Claesson

Affiliations

  1. Instituto de Investigación Biomédica de Málaga (IBIMA), Unidad de Gestión Clìnica de Endocrinologìa y Nutrición, Hospital Clìnico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain.
  2. Centro de Investigación Biomeìdica en Red de Fisiopatologtìa de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
  3. Department of Computing, University of Turku, Turku, Finland.
  4. Human Genetics and Disease Mechanisms, Latvian Biomedical Research and Study Centre, Riga, Latvia.
  5. Latvian Biomedical Research and Study Centre, Riga, Latvia.
  6. Department of Epidemiology, Erasmus Medical Center, Rotterdam, Netherlands.
  7. Department of Genetics and Bioengineering, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina.
  8. Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Turkey.
  9. Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey.
  10. Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain.
  11. Department for Biomedical Sciences, Institute for Biomedical Technologies, National Research Council, Bari, Italy.
  12. Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg.
  13. Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, Odense, Denmark.
  14. Department of Biology, University of Fribourg, Fribourg, Switzerland.
  15. Swiss Institute of Bioinformatics, Lausanne, Switzerland.
  16. Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
  17. Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Turkey.
  18. Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia.
  19. Department of Microbiology, University of Innsbruck, Innsbruck, Austria.
  20. NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal.
  21. Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal.
  22. CINTESIS, NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal.
  23. Computational Oncology, Sage Bionetworks, Seattle, WA, United States.
  24. Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
  25. Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina.
  26. Department of Computer Science, University of Bari Aldo Moro, Bari, Italy.
  27. Faculty of Information Tehnology and Bionics, Pázmány University, Budapest, Hungary.
  28. Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.
  29. Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruñ, Poland.
  30. Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Heidelberg, Germany.
  31. Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom.
  32. Department of Clinical Science, University of Bergen, Bergen, Norway.
  33. Jozef Stefan Institute, Ljubljana, Slovenia.
  34. Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia.
  35. Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia.
  36. Molecular Nutrition and Proteomics Lab, Faculty of the Food Science and Technology, Institute of Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania.
  37. Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.
  38. Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania.
  39. Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia.
  40. Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece.
  41. Department of Computer Technologies, Karadeniz Technical University, Trabzon, Turkey.
  42. European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany.
  43. Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands.
  44. Navarrabiomed, Complejo Hospitalario de Navarra (CHN), IdiSNA, Universidad Pública de Navarra (UPNA), Pamplona, Spain.
  45. School of Microbiology and APC Microbiome Ireland, University College Cork, Cork, Ireland.

PMID: 33692771 PMCID: PMC7937616 DOI: 10.3389/fmicb.2021.635781

Abstract

The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.

Copyright © 2021 Moreno-Indias, Lahti, Nedyalkova, Elbere, Roshchupkin, Adilovic, Aydemir, Bakir-Gungor, Santa Pau, D’Elia, Desai, Falquet, Gundogdu, Hron, Klammsteiner, Lopes, Marcos-Zambrano, Marques, Mason, May, Pašić, Pio, Pongor, Promponas, Przymus, Saez-Rodriguez, Sampri, Shigdel, Stres, Suharoschi, Truu, Truică, Vilne, Vlachakis, Yilmaz, Zeller, Zomer, Gómez-Cabrero and Claesson.

Keywords: ML4Microbiome; biomarker identification; machine learning; microbiome; personalized medicine

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Oncotarget. 2017 Feb 7;8(6):9546-9556 - PubMed
  2. Microbiome. 2014 May 05;2:15 - PubMed
  3. F1000Res. 2020 Oct 15;9:1246 - PubMed
  4. Nature. 2019 Apr;568(7750):43-48 - PubMed
  5. Cell Host Microbe. 2017 Aug 9;22(2):134-141 - PubMed
  6. BioData Min. 2017 Dec 11;10:36 - PubMed
  7. Genome Biol. 2013 Jan 15;14(1):R2 - PubMed
  8. Microbiome. 2018 Feb 01;6(1):23 - PubMed
  9. Nat Methods. 2016 Jul;13(7):581-3 - PubMed
  10. Appl Environ Microbiol. 2009 Dec;75(23):7537-41 - PubMed
  11. Nucleic Acids Res. 2020 Jan 8;48(D1):D570-D578 - PubMed
  12. PLoS One. 2012;7(2):e30126 - PubMed
  13. mSystems. 2018 Nov 13;3(6): - PubMed
  14. PLoS One. 2013 Apr 22;8(4):e61217 - PubMed
  15. Microbiome. 2017 Mar 3;5(1):27 - PubMed
  16. Brain Behav Immun. 2014 May;38:1-12 - PubMed
  17. Nat Commun. 2020 Jul 14;11(1):3514 - PubMed
  18. Nat Med. 2016 Jul 7;22(7):713-22 - PubMed
  19. Genome Biol. 2014;15(12):550 - PubMed
  20. BMC Bioinformatics. 2008 Sep 19;9:386 - PubMed
  21. BMC Bioinformatics. 2019 Jul 3;20(1):374 - PubMed
  22. Nat Microbiol. 2018 Mar;3(3):347-355 - PubMed
  23. mBio. 2020 Jun 9;11(3): - PubMed
  24. mSystems. 2018 Jan 9;3(1): - PubMed
  25. Nat Methods. 2014 Nov;11(11):1144-6 - PubMed
  26. Nature. 2010 Mar 4;464(7285):59-65 - PubMed
  27. Genome Res. 2015 Oct;25(10):1558-69 - PubMed
  28. PLoS Comput Biol. 2016 Jul 11;12(7):e1004977 - PubMed
  29. Front Microbiol. 2020 Feb 19;11:136 - PubMed
  30. Front Microbiol. 2017 Nov 15;8:2224 - PubMed
  31. Bioinformatics. 2018 Jul 1;34(13):i32-i42 - PubMed
  32. BMC Genomics. 2019 Dec 10;20(1):960 - PubMed
  33. Hypertension. 2020 Nov;76(5):1555-1562 - PubMed
  34. Nat Methods. 2019 Jul;16(7):627-632 - PubMed
  35. Nat Microbiol. 2018 Jan;3(1):8-16 - PubMed
  36. Microbiome. 2018 Dec 17;6(1):226 - PubMed
  37. Drug Discov Today. 2018 Sep;23(9):1644-1657 - PubMed
  38. Gigascience. 2017 Aug 1;6(8):1-11 - PubMed
  39. Science. 2016 Apr 29;352(6285):565-9 - PubMed
  40. Cell. 2018 Mar 8;172(6):1198-1215 - PubMed
  41. Mol Ecol. 2018 Jun;27(12):2714-2724 - PubMed
  42. ISME J. 2012 Mar;6(3):564-76 - PubMed
  43. Genome Res. 2013 Oct;23(10):1704-14 - PubMed
  44. Bioinformatics. 2018 Apr 1;34(7):1235-1237 - PubMed
  45. Nat Rev Gastroenterol Hepatol. 2019 Nov;16(11):656-661 - PubMed
  46. Brief Bioinform. 2019 May 21;20(3):752-766 - PubMed
  47. Nat Biotechnol. 2018 Dec 03;: - PubMed
  48. PeerJ. 2020 Mar 24;8:e8783 - PubMed
  49. Cell. 2016 Dec 1;167(6):1469-1480.e12 - PubMed
  50. J Stat Softw. 2014;59(13):1-21 - PubMed
  51. IEEE J Biomed Health Inform. 2020 Oct;24(10):2993-3001 - PubMed
  52. Genome Biol. 2019 Dec 23;20(1):293 - PubMed
  53. PLoS One. 2014 Apr 22;9(4):e95511 - PubMed
  54. Front Immunol. 2019 Jan 07;9:2868 - PubMed
  55. Forensic Sci Int Genet. 2019 Jul;41:72-82 - PubMed
  56. PeerJ. 2017 Feb 9;5:e2969 - PubMed
  57. Methods. 2019 Aug 15;166:74-82 - PubMed
  58. PLoS One. 2018 Nov 9;13(11):e0207072 - PubMed
  59. Nat Methods. 2011 Jul 17;8(9):761-3 - PubMed
  60. Nature. 2007 Oct 18;449(7164):804-10 - PubMed
  61. Nat Biotechnol. 2017 Sep 12;35(9):833-844 - PubMed
  62. J Microbiol. 2020 Mar;58(3):206-216 - PubMed
  63. Bioinformatics. 2020 Jul 1;36(Suppl_1):i39-i47 - PubMed
  64. Front Microbiol. 2020 Apr 03;11:393 - PubMed
  65. Genome Biol. 2011 Jun 24;12(6):R60 - PubMed
  66. World J Gastroenterol. 2016 Jan 14;22(2):501-18 - PubMed
  67. Sci Rep. 2020 Apr 7;10(1):6026 - PubMed
  68. Nat Protoc. 2020 Mar;15(3):799-821 - PubMed
  69. Nat Rev Immunol. 2013 Nov;13(11):790-801 - PubMed
  70. Microbiome. 2020 Jun 30;8(1):103 - PubMed
  71. Nat Commun. 2019 Nov 28;10(1):5416 - PubMed
  72. Immunol Rev. 2017 Sep;279(1):90-105 - PubMed
  73. Nat Commun. 2014 Jul 08;5:4344 - PubMed
  74. Science. 2016 Apr 29;352(6285):560-4 - PubMed
  75. Biostatistics. 2019 Oct 1;20(4):599-614 - PubMed
  76. PLoS Comput Biol. 2019 Jul 25;15(7):e1007007 - PubMed
  77. Nat Rev Genet. 2016 Jul 15;17(8):470-86 - PubMed
  78. Nature. 2019 Apr;568(7753):505-510 - PubMed
  79. PeerJ. 2015 Oct 08;3:e1319 - PubMed
  80. Nat Rev Microbiol. 2018 Jul;16(7):410-422 - PubMed
  81. mSystems. 2019 May 14;4(4): - PubMed
  82. Nat Genet. 2019 Apr;51(4):600-605 - PubMed
  83. Nat Microbiol. 2020 Sep;5(9):1079-1087 - PubMed
  84. J Transl Med. 2017 Apr 8;15(1):73 - PubMed
  85. Sci Adv. 2020 Oct 14;6(42): - PubMed
  86. Nat Biotechnol. 2019 Aug;37(8):852-857 - PubMed
  87. Mol Biol Evol. 2020 Feb 1;37(2):593-598 - PubMed
  88. mBio. 2018 Jun 5;9(3): - PubMed

Publication Types