Display options
Share it on

Science. 2015 Aug 07;349(6248):636-8. doi: 10.1126/science.aaa9375.

STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis.

Science (New York, N.Y.)

Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth

Affiliations

  1. Microsoft Research, Mountain View, CA 94043, USA. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].
  2. IBM Almaden Research Center, San Jose, CA 95120, USA. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].
  3. Google Research, Mountain View, CA 94043, USA. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].
  4. Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].
  5. Samsung Research America, Mountain View, CA 94043, USA. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].
  6. Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA. [email protected] [email protected] [email protected] [email protected] [email protected] [email protected].

PMID: 26250683 DOI: 10.1126/science.aaa9375

Abstract

Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses.

Copyright © 2015, American Association for the Advancement of Science.

Publication Types