Display options
Share it on

J Cheminform. 2018 Apr 03;10(1):17. doi: 10.1186/s13321-018-0271-1.

A confidence predictor for logD using conformal regression and a support-vector machine.

Journal of cheminformatics

Maris Lapins, Staffan Arvidsson, Samuel Lampa, Arvid Berg, Wesley Schaal, Jonathan Alvarsson, Ola Spjuth

Affiliations

  1. Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden.
  2. Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden. [email protected].

PMID: 29616425 PMCID: PMC5882484 DOI: 10.1186/s13321-018-0271-1

Abstract

Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.

Keywords: Conformal prediction; LogD; Machine learning; QSAR; RDF; Support-vector machine

References

  1. J Cheminform. 2011 May 16;3:18 - PubMed
  2. Bioorg Med Chem Lett. 2008 Sep 1;18(17):4872-5 - PubMed
  3. J Chem Inf Comput Sci. 2003 May-Jun;43(3):707-20 - PubMed
  4. Neural Netw. 2011 Oct;24(8):842-51 - PubMed
  5. J Cheminform. 2016 Aug 10;8:39 - PubMed
  6. J Biomed Semantics. 2014 Mar 06;5(1):14 - PubMed
  7. Mol Inform. 2011 Aug;30(8):707-20 - PubMed
  8. J Chromatogr A. 2010 Mar 19;1217(12):1950-5 - PubMed
  9. ACS Chem Neurosci. 2010 Jun 16;1(6):435-49 - PubMed
  10. J Cheminform. 2013 May 21;5:24 - PubMed
  11. Bioinformatics. 2016 Jan 1;32(1):85-95 - PubMed
  12. J Chem Inf Model. 2011 Aug 22;51(8):1840-7 - PubMed
  13. Nat Rev Drug Discov. 2007 Nov;6(11):881-90 - PubMed
  14. J Pharm Sci. 2009 Mar;98(3):861-93 - PubMed
  15. Bioorg Med Chem Lett. 2007 Mar 15;17 (6):1759-64 - PubMed
  16. Expert Opin Drug Discov. 2010 Mar;5(3):235-48 - PubMed
  17. J Cheminform. 2015 Jul 14;7:34 - PubMed
  18. Mol Pharm. 2017 May 1;14 (5):1571-1576 - PubMed
  19. J Chem Inf Model. 2014 Jun 23;54(6):1596-603 - PubMed
  20. Eur J Pharm Sci. 2016 Sep 20;92 :110-6 - PubMed
  21. Nat Rev Drug Discov. 2012 Apr 30;11(5):355-65 - PubMed
  22. J Chem Inf Model. 2009 Nov;49(11):2551-8 - PubMed
  23. Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13 - PubMed
  24. SAR QSAR Environ Res. 2016 Apr;27(4):303-16 - PubMed
  25. Mol Inform. 2015 Jun;34(6-7):357-66 - PubMed
  26. Bioorg Med Chem Lett. 2009 May 15;19(10):2844-51 - PubMed
  27. Bioorg Med Chem Lett. 2009 Oct 1;19(19):5560-4 - PubMed
  28. J Cheminform. 2017 Jun 6;9(1):33 - PubMed
  29. Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954 - PubMed
  30. Drug Discov Today. 2003 Apr 1;8(7):316-23 - PubMed
  31. J Chem Inf Model. 2014 Nov 24;54(11):3211-7 - PubMed

Publication Types

Grant support