Display options
Share it on

J Comput Aided Mol Des. 2021 Jul;35(7):819-830. doi: 10.1007/s10822-021-00400-x. Epub 2021 Jun 28.

Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method.

Journal of computer-aided molecular design

Nazanin Donyapour, Alex Dickson

Affiliations

  1. Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA.
  2. Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA. [email protected].
  3. Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA. [email protected].

PMID: 34181200 PMCID: PMC8295205 DOI: 10.1007/s10822-021-00400-x

Abstract

The prediction of [Formula: see text] values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a "master" dataset comprised of over 41,000 unique [Formula: see text] values. The specific molecular targets in the SAMPL7 [Formula: see text] prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 [Formula: see text] units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end [Formula: see text] predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.

© 2021. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Keywords: Chemical features; Geometric scattering for graphs; Log P; Machine learning; Molecular representations; Neural networks; Partition coefficient; SAMPL7 challenge

References

  1. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Adv Drug Deliv Rev 23(1–3):3 - PubMed
  2. Noble A (1993) J Chromatogr A 642(1–2):3 - PubMed
  3. Paschke A, Neitzel PL, Walther W, Schüürmann G (2004) J Chem Eng Data 49(6):1639 - PubMed
  4. Sicbaldi F, Del Re AA (1993) Reviews of environmental contamination and toxicology. Springer, Berlin, pp 59–93 - PubMed
  5. Kajiya K, Ichiba M, Kuwabara M, Kumazawa S, Nakayama T (2001) Biosci Biotechnol Biochem 65(5):1227 - PubMed
  6. Hermens JL, de Bruijn JH, Brooke DN (2013) Environ Toxicol Chem 32(4):732 - PubMed
  7. Schwarzenbach RP, Gschwend PM, Imboden DM (2005) Environmental organic chemistry. Wiley, New York - PubMed
  8. Cheng T, Zhao Y, Li X, Lin F, Xu Y, Zhang X, Li Y, Wang R, Lai L (2007) J Chem Inf Model 47(6):2140 - PubMed
  9. Ghose AK, Crippen GM (1986) J Comput Chem 7(4):565 - PubMed
  10. Leo AJ (1993) Chem Rev 93(4):1281 - PubMed
  11. Meylan WM, Howard PH (1995) J Pharm Sci 84(1):83 - PubMed
  12. Plante J, Werner S (2018) J Cheminf 10(1):61 - PubMed
  13. Molnár L, Keserű GM, Papp Á, Gulyás Z, Darvas F (2004) Bioorg Med Chem Lett 14(4):851 - PubMed
  14. Huuskonen JJ, Livingstone DJ, Tetko IV (2000) J Chem Inf Comput Sci 40(4):947 - PubMed
  15. Moriguchi I, Hirono S, Liu Q, Nakagome I, Matsushita Y (1992) Chem Pharm Bull 40(1):127 - PubMed
  16. Chen D, Wang Q, Li Y, Li Y, Zhou H, Fan Y (2020) Chemosphere 247:125869 - PubMed
  17. Mannhold R, Poda GI, Ostermann C, Tetko IV (2009) J Pharm Sci 98(3):861 - PubMed
  18. Tetko IV, Tanchuk VY, Villa AE (2001) J Chem Inf Comput Sci 41(5):1407 - PubMed
  19. ADMET Predictor(TM) version 2.3.0, Simulations Plus, Inc - PubMed
  20. CSLogP version 2.2.0.0, ChemSilico LLC, USA,   http://www.chemsilico.com - PubMed
  21. Silicos-it, Filter-it version 1.0.2,  http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/software/filter-it/1.0.2/filter-it.html - PubMed
  22. Wu K, Zhao Z, Wang R, Wei GW (2018) J Comput Chem 39(20):1444 - PubMed
  23. Korshunova M, Ginsburg B, Tropsha A, Isayev O (2021) J Chem Inf Model 61(1):7 - PubMed
  24. Donyapour N, Hirn M, Dickson A (2021) J Comput Chem 42(14):1006 - PubMed
  25. SAMPL challenges,  http://samplchallenges.github.io - PubMed
  26. Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL (2020) J Comput Aid Mol Des 34(4):335–370 - PubMed
  27. Bergazin TD, Tielker N, Zhang Y, Mao J, Gunner MR, Ballatore C, Kast S, Mobley D et al (2021) ChemRxiv. https://doi.org/10.26434/chemrxiv.14461962.v1 - PubMed
  28. Popova M, Isayev O, Tropsha A (2018) Sci Adv 4(7):7885 - PubMed
  29. Lui R, Guan D, Matthews S (2020) J Comput Aid Mol Des 34:523 - PubMed
  30. Krämer A, Hudson PS, Jones MR, Brooks BR (2020) J Comput Aid Mol Des 32:983 - PubMed
  31. Ding Y, Xu Y, Qian C, Chen J, Zhu J, Huang H, Shi Y, Huang J (2020) J Comput Aid Mol Des 298:31 - PubMed
  32. Riquelme M, Vöhringer-Martinez E (2020) J Comput Aid Mol Des 34(1):39–54 - PubMed
  33. Fan S, Iorga BI, Beckstein O (2020) J Comput Aid Mol Des 30:1045 - PubMed
  34. Procacci P, Guarnieri G (2019) J Comput Aid Mol Des 35:49–61 - PubMed
  35. Marenich AV, Cramer CJ, Truhlar DG (2009) J Phys Chem B 113(18):6378 - PubMed
  36. Loschen C, Reinisch J, Klamt A (2020) J Comput Aid Mol Des 34(4):385 - PubMed
  37. Tielker N, Tomazic D, Eberlein L, Güssregen S, Kast SM (2020) J Comput Aid Mol Des 34:709–715 - PubMed
  38. Guan D, Lui R, Matthews S (2020) J Comput Aid Mol Des 34:535 - PubMed
  39. Jones MR, Brooks BR (2020) J Comput Aid Mol Des 34:535 - PubMed
  40. Ouimet JA, Paluch AS (2020) J Comput Aid Mol Des 34:574 - PubMed
  41. Zamora WJ, Pinheiro S, German K, Ràfols C, Curutchet C, Luque FJ (2020) J Compu Aid Mol Des 34(4):443 - PubMed
  42. Wang S, Riniker S (2019) J Comput Aid Mol Des 34:393 - PubMed
  43. Patel P, Kuntz DM, Jones MR, Brooks BR, Wilson AK (2020) J Comput Aid Mol Des 34:495 - PubMed
  44. Arslan E, Findik BK, Aviyente V (2020) J Comput Aid Mol Des 34:463 - PubMed
  45. Port A, Bordas M, Enrech R, Pascual R, Rosés M, Ràfols C, Subirats X, Bosch E (2018) Eur J Pharm Sci 122:331 - PubMed
  46. NonStar, logP database,  https://ochem.eu/article/17434 - PubMed
  47. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) J Chem Inf Model 55(2):263 - PubMed
  48. Lusci A, Pollastri G, Baldi P (2013) J Chem Inf Model 53(7):1563 - PubMed
  49. Feinberg EN, Sur D, Wu Z, Husic BE, Mai H, Li Y, Sun S, Yang J, Ramsundar B, Pande VS (2018) ACS Cent Sci 4(11):1520 - PubMed
  50. Gao P, Zhang J, Sun Y, Yu J (2020) Phys Chem Chem Phys 22(41):23766 - PubMed
  51. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Adv Neural Inf Process Syst 28:2224–2232 - PubMed
  52. Smith JS, Isayev O, Roitberg AE (2017) Chem Sci 8(4):3192 - PubMed
  53. Gao F, Wolf G, Hirn M (2019) International conference on machine learning, pages 2122–2131 - PubMed
  54. Vanommeslaeghe K, MacKerell AD Jr (2012) J Chem Inf Model 52(12):3144 - PubMed
  55. Vanommeslaeghe K, Raman EP, MacKerell AD Jr (2012) J Chem Inf Model 52(12):3155 - PubMed
  56. Maier JA, Martinez C, Kasavajhala K, Wickstrom L, Hauser KE, Simmerling C (2015) J Chem Theory Comput 11(8):3696 - PubMed
  57. Vassetti D, Pagliai M, Procacci P (2019) J Chem Theory Comput 15(3):1983 - PubMed
  58. Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) J Comput Chem 25(9):1157 - PubMed
  59. Rappé AK, Casewit CJ, Colwell K, Goddard WA III, Skiff WM (1992) J Am Chem Soc 114(25):10024 - PubMed
  60. Halgren TA (1996) J Comput Chem 17(5–6):490 - PubMed
  61. Halgren TA (1996) J Comput Chem 17(5–6):520 - PubMed
  62. Hassinen T, Peräkylä M (2001) J Comput Chem 22(12):1229 - PubMed
  63. Francisco KR, Varricchio C, Paniak TJ, Kozlowski MC, Brancale A, Ballatore C (2021) Eur J Med Chem 218:113399 - PubMed
  64. RDkit, Open-source cheminformatics, https://www.rdkit.org - PubMed
  65. Howard P, Meylan W (1999) Physical/chemical property database (PHYSPROP), Syracuse Research Corp, Environmental Science Center, North Syracuse, NY, 1999.  http://www.syrres.com/esc/physdemo.htm - PubMed
  66. Huuskonen JJ, Villa AE, Tetko IV (1999) J Pharm Sci 88(2):229 - PubMed
  67. Klopman G, Li JY, Wang S, Dimayuga M (1994) J Chem Inf Comput Sci 34(4):752 - PubMed
  68. Hansch C, Leo A, Hoekman D (1995) Exploring QSAR: Fundamentals and Applications in Chemistry and Biology, American Chemical Society, Washington, DC - PubMed
  69. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) J Cheminf 3(1):33 - PubMed
  70. The Open babel package, version 3.1.1,  http://openbabel.org - PubMed
  71. Kipf TN, Welling M (2016) arXiv preprint arXiv:1609.02907 - PubMed
  72. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Adv Neural Inf Process Syst 32:8024–8035 - PubMed
  73. Tietz M, Fan TJ, Nouri D, Bossan B (2017) skorch Developers, skorch: A scikit-learn compatible neural network library that wraps PyTorch. https://skorch.readthedocs.io/en/stable/ - PubMed
  74. Kingma DP, Ba J (2014) arXiv preprint arXiv:1412.6980 - PubMed
  75. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) J Mach Learn Res 12:2825 - PubMed
  76. Heskes T, Wiegerinck W, Kappen H (1997) Prog Neural Process 375:128–135 - PubMed
  77. Kumar S, Srivastava A (2012) Proceedings on 18th ACM SIGKDD conference knowledgement discovery data mining - PubMed
  78. Nix DA, Weigend AS (1994) in Proceedings of 1994 ieee international conference on neural networks (ICNN’94), vol. 1 (IEEE, 1994), vol. 1, pp. 55–60 - PubMed

Publication Types

Grant support