Display options
Share it on

NAR Genom Bioinform. 2021 Jul 20;3(3):lqab065. doi: 10.1093/nargab/lqab065. eCollection 2021 Sep.

DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies.

NAR genomics and bioinformatics

Bettina Mieth, Alexandre Rozier, Juan Antonio Rodriguez, Marina M C Höhne, Nico Görnitz, Klaus-Robert Müller

Affiliations

  1. Machine Learning Group, Technische Universität Berlin, Berlin 10587, Germany.
  2. CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08003, Spain.
  3. 123ai.de, Berlin 10319, Germany.

PMID: 34296082 PMCID: PMC8291080 DOI: 10.1093/nargab/lqab065

Abstract

Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers' decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

References

  1. Am J Hum Genet. 2017 Jul 6;101(1):5-22 - PubMed
  2. Nat Rev Genet. 2009 Jun;10(6):392-404 - PubMed
  3. Proc Natl Acad Sci U S A. 2002 May 14;99(10):6562-6 - PubMed
  4. Am J Hum Genet. 2013 Jun 6;92(6):1008-12 - PubMed
  5. Nat Genet. 2018 Jul;50(7):906-908 - PubMed
  6. BMC Med Genet. 2017 Aug 29;18(1):94 - PubMed
  7. Genet Epidemiol. 2010 Nov;34(7):643-52 - PubMed
  8. PLoS Genet. 2014 Nov 13;10(11):e1004754 - PubMed
  9. Bioinformatics. 2015 Jun 1;31(11):1754-61 - PubMed
  10. BMC Bioinformatics. 2013 Apr 24;14:138 - PubMed
  11. Am J Hum Genet. 2012 Jan 13;90(1):7-24 - PubMed
  12. Am J Hum Genet. 2011 Mar 11;88(3):294-305 - PubMed
  13. Genomics Inform. 2012 Dec;10(4):220-5 - PubMed
  14. Artif Intell Med. 2016 Jan;66:63-71 - PubMed
  15. Nat Genet. 2015 Mar;47(3):284-90 - PubMed
  16. Sci Rep. 2013;3:1099 - PubMed
  17. Nature. 2015 May 28;521(7553):436-44 - PubMed
  18. Genet Sel Evol. 2018 Dec 22;50(1):70 - PubMed
  19. J R Soc Interface. 2018 Apr;15(141): - PubMed
  20. Nucleic Acids Res. 2016 Jun 20;44(11):e107 - PubMed
  21. Physiol Rep. 2014 Jul 30;2(7): - PubMed
  22. Nat Genet. 2002 Dec;32(4):650-4 - PubMed
  23. Biometrics. 1990 Sep;46(3):673-87 - PubMed
  24. Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8 - PubMed
  25. PLoS Genet. 2015 Oct 01;11(10):e1005378 - PubMed
  26. PLoS Genet. 2013 Mar;9(3):e1003348 - PubMed
  27. Nat Genet. 2019 Mar;51(3):394-403 - PubMed
  28. Biometrics. 1982 Dec;38(4):963-74 - PubMed
  29. Trends Genet. 2018 Jul;34(7):504-517 - PubMed
  30. Sci Rep. 2019 Jul 17;9(1):10351 - PubMed
  31. Science. 2020 Sep 11;369(6509):1318-1330 - PubMed
  32. Nat Methods. 2011 Sep 04;8(10):833-5 - PubMed
  33. Nat Genet. 2018 Sep;50(9):1335-1341 - PubMed
  34. Nat Commun. 2021 Mar 30;12(1):1964 - PubMed
  35. Sci Rep. 2016 Nov 28;6:36671 - PubMed
  36. Ann Stat. 2009 Jan 1;37(5A):2178-2201 - PubMed
  37. Nat Rev Genet. 2015 Jun;16(6):321-32 - PubMed
  38. PLoS Comput Biol. 2012 Jan;8(1):e1002330 - PubMed
  39. Stat Sci. 2009 Nov 1;24(4):561-573 - PubMed
  40. Nat Commun. 2019 Mar 11;10(1):1096 - PubMed
  41. Nat Commun. 2017 Jan 09;8:13890 - PubMed
  42. Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012 - PubMed
  43. Am J Hum Genet. 2013 Nov 7;93(5):779-97 - PubMed
  44. Nucleic Acids Res. 2011 May;39(9):e62 - PubMed
  45. Nat Genet. 2015 Nov;47(11):1228-35 - PubMed
  46. Nucleic Acids Res. 2019 Aug 22;47(14):e79 - PubMed
  47. Nat Genet. 2018 Aug;50(8):1112-1121 - PubMed
  48. Genomics. 2012 Jun;99(6):323-9 - PubMed
  49. Genomics Inform. 2016 Dec;14(4):138-148 - PubMed
  50. PLoS Comput Biol. 2012;8(12):e1002822 - PubMed
  51. Biochem Biophys Res Commun. 2017 Jan 22;482(4):1367-1374 - PubMed
  52. Nature. 2007 Jun 7;447(7145):661-78 - PubMed
  53. Nucleic Acids Res. 2019 Dec 16;47(22):e146 - PubMed
  54. Genet Epidemiol. 2013 Feb;37(2):184-95 - PubMed
  55. J Neurosci Methods. 2016 Dec 1;274:141-145 - PubMed
  56. Genet Epidemiol. 2011 Feb;35(2):111-8 - PubMed
  57. PLoS One. 2015 Jul 10;10(7):e0130140 - PubMed
  58. Commun Biol. 2019 Jan 7;2:9 - PubMed
  59. Nat Rev Genet. 2013 Jan;14(1):1-2 - PubMed
  60. Am J Hum Genet. 2007 Sep;81(3):559-75 - PubMed
  61. Sci Rep. 2018 Apr 17;8(1):6085 - PubMed
  62. PLoS One. 2014 Apr 02;9(4):e93379 - PubMed
  63. PLoS Med. 2015 Mar 31;12(3):e1001779 - PubMed
  64. Bioinformatics. 2011 Jul 1;27(13):i342-8 - PubMed
  65. Nat Biotechnol. 2012 Apr 10;30(4):317-20 - PubMed
  66. Bioinformatics. 2015 Nov 1;31(21):3555-7 - PubMed
  67. Nat Rev Genet. 2013 Jul;14(7):507-15 - PubMed
  68. Bioinformatics. 2010 Oct 1;26(19):2375-82 - PubMed
  69. Bioinformatics. 2006 Jul 15;22(14):e472-80 - PubMed
  70. Hum Mol Genet. 2009 Sep 15;18(18):3525-31 - PubMed

Publication Types