Display options
Share it on

F1000Res. 2020 Oct 15;9:1239. doi: 10.12688/f1000research.26429.2. eCollection 2020.

netDx: Software for building interpretable patient classifiers by multi-'omic data integration using patient similarity networks.

F1000Research

Shraddha Pai, Philipp Weber, Ruth Isserlin, Hussam Kaka, Shirley Hui, Muhammad Ahmad Shah, Luca Giudice, Rosalba Giugno, Anne Krogh Nøhr, Jan Baumbach, Gary D Bader

Affiliations

  1. The Donnelly Centre, University of Toronto, Toronto, Canada.
  2. Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
  3. Department of Computer Science, University of Verona, Verona, Italy.
  4. The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
  5. H. Lundbeck A/S, Copenhagen, Denmark.
  6. TUM School of Life Sciences Wiehenstephan, Technical University of Munich, Munich, Germany.
  7. Department of Molecular Genetics, University of Toronto, Toronto, Canada.
  8. Department of Computer Science, University of Toronto, Toronto, Canada.
  9. The Lunenfeld-Tanenbaum Research Institute, Mount Sinal Hospital, Toronto, Canada.

PMID: 33628435 PMCID: PMC7883323 DOI: 10.12688/f1000research.26429.2

Abstract

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data - a common problem in real-world data - without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.

Copyright: © 2021 Pai S et al.

Keywords: classification; data integration; genomics; networks; precision medicine; supervised learning

Conflict of interest statement

No competing interests were disclosed.

References

  1. Cell Rep. 2018 Jun 12;23(11):3392-3406 - PubMed
  2. Nat Methods. 2013 Nov;10(11):1108-15 - PubMed
  3. Mol Syst Biol. 2019 Mar 14;15(3):e8497 - PubMed
  4. F1000Res. 2019 Oct 18;8:1774 - PubMed
  5. Genome Res. 2003 Nov;13(11):2498-504 - PubMed
  6. Bioinformatics. 2018 Aug 15;34(16):2859-2861 - PubMed
  7. J Mol Biol. 2018 Sep 14;430(18 Pt A):2924-2938 - PubMed
  8. Nat Methods. 2015 Feb;12(2):115-21 - PubMed
  9. Cell. 2015 Oct 8;163(2):506-19 - PubMed
  10. Am J Hum Genet. 2014 May 1;94(5):677-94 - PubMed
  11. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W115-22 - PubMed
  12. Cell. 2011 Mar 4;144(5):646-74 - PubMed
  13. Nature. 2012 Oct 4;490(7418):61-70 - PubMed
  14. PLoS One. 2010 Nov 15;5(11):e13984 - PubMed
  15. F1000Res. 2018 Jan 3;7:8 - PubMed
  16. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W214-20 - PubMed
  17. Carcinogenesis. 2010 Jan;31(1):2-8 - PubMed
  18. Nat Neurosci. 2016 Nov;19(11):1442-1453 - PubMed
  19. PLoS Comput Biol. 2010 Jan 15;6(1):e1000641 - PubMed
  20. BMC Med Genomics. 2015;8 Suppl 1:S7 - PubMed
  21. F1000Res. 2016 Jul 15;5:1717 - PubMed

MeSH terms

Publication Types

Grant support