Display options
Share it on

J Cheminform. 2009 Jul 14;1(1):11. doi: 10.1186/1758-2946-1-11.

DPRESS: Localizing estimates of predictive uncertainty.

Journal of cheminformatics

Robert D Clark

Affiliations

  1. Biochemical Infometrics, 827 Renee Lane, Creve Coeur MO 63141, USA. [email protected].

PMID: 20298517 PMCID: PMC3225832 DOI: 10.1186/1758-2946-1-11

Abstract

BACKGROUND: The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction su can be estimated as the non-cross-validated error st* for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set.The predictive uncertainty factor gammat* is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that st* and gammat*are characteristic of each training set compound contributing to the model of interest.

RESULTS: The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so.

CONCLUSION: DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty.

References

  1. J Chem Inf Model. 2008 Mar;48(3):498-508 - PubMed
  2. J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69 - PubMed
  3. J Comput Chem. 2003 Jul 30;24(10):1215-21 - PubMed
  4. J Chem Inf Comput Sci. 1998 Jul-Aug;38(4):669-77 - PubMed
  5. J Med Chem. 2004 Jul 15;47(15):3777-87 - PubMed
  6. J Chem Inf Model. 2008 Sep;48(9):1733-46 - PubMed
  7. J Mol Graph Model. 2008 Jun;26(8):1315-26 - PubMed
  8. J Comput Aided Mol Des. 2003 Feb-Apr;17(2-4):265-75 - PubMed
  9. J Med Chem. 1979 May;22(5):476-83 - PubMed
  10. J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1912-28 - PubMed
  11. J Comput Aided Mol Des. 1993 Feb;7(1):71-82 - PubMed
  12. J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):579-86 - PubMed
  13. J Chem Inf Model. 2005 Jan-Feb;45(1):65-73 - PubMed
  14. J Am Chem Soc. 1988 Aug 1;110(18):5959-67 - PubMed
  15. J Med Chem. 2001 Sep 27;44(20):3223-30 - PubMed
  16. J Chem Inf Model. 2008 May;48(5):971-80 - PubMed
  17. J Comput Aided Mol Des. 2007 Dec;21(12):651-64 - PubMed
  18. J Comput Aided Mol Des. 1993 Oct;7(5):587-619 - PubMed
  19. Drug Discov Today. 2000 Oct 1;5(10):445-454 - PubMed
  20. J Mol Graph Model. 2002 Jan;20(4):269-76 - PubMed
  21. J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):563-76 - PubMed
  22. J Mol Graph Model. 2000 Aug-Oct;18(4-5):404-11, 527-32 - PubMed
  23. J Mol Graph Model. 2005 Jun;23(6):503-23 - PubMed

Publication Types