Display options
Share it on

JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

JAMIA open

Randi E Foraker, Sean C Yu, Aditi Gupta, Andrew P Michelson, Jose A Pineda Soto, Ryan Colvin, Francis Loh, Marin H Kollef, Thomas Maddox, Bradley Evanoff, Hovav Dror, Noa Zamstein, Albert M Lai, Philip R O Payne

Affiliations

  1. Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
  2. Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
  3. Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
  4. Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA.
  5. School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
  6. Healthcare Innovation Lab, BJC Healthcare, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
  7. MDClone Ltd, Beer Sheva, Israel.

PMID: 33623891 PMCID: PMC7886551 DOI: 10.1093/jamiaopen/ooaa060

Abstract

BACKGROUND: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification.

OBJECTIVES: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns.

METHODS: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3).

RESULTS: For each use case, the results of the analyses were sufficiently statistically similar (

DISCUSSION AND CONCLUSION: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Keywords: data analysis; electronic health records and systems; precision health care; protected health information; synthetic data

References

  1. Crit Care Med. 1996 May;24(5):743-52 - PubMed
  2. Intensive Care Med. 2003 Apr;29(4):530-8 - PubMed
  3. J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238 - PubMed
  4. BMC Med Inform Decis Mak. 2010 Oct 14;10:59 - PubMed
  5. JACC Basic Transl Sci. 2018 Nov 12;3(5):716-718 - PubMed
  6. BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44 - PubMed
  7. JAMA. 2014 Jul 2;312(1):90-2 - PubMed
  8. JMIR Med Inform. 2020 Feb 20;8(2):e16492 - PubMed
  9. JAMA. 2017 Oct 3;318(13):1241-1249 - PubMed
  10. J Am Med Inform Assoc. 2019 Mar 1;26(3):228-241 - PubMed

Publication Types