Display options
Share it on

Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.

Missing data and multiple imputation in clinical epidemiological research.

Clinical epidemiology

Alma B Pedersen, Ellen M Mikkelsen, Deirdre Cronin-Fenton, Nickolaj R Kristensen, Tra My Pham, Lars Pedersen, Irene Petersen

Affiliations

  1. Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus N, Denmark.
  2. Department of Primary Care and Population Health, University College London, London, UK.
  3. Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus N, Denmark; Department of Primary Care and Population Health, University College London, London, UK.

PMID: 28352203 PMCID: PMC5358992 DOI: 10.2147/CLEP.S129785

Abstract

Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

Keywords: MAR; MCAR; MNAR; missing data; multiple imputation; observational study

Conflict of interest statement

Disclosure The authors report no conflicts of interest in this work.

References

  1. Prev Sci. 2007 Sep;8(3):206-13 - PubMed
  2. Am J Epidemiol. 2008 Aug 15;168(4):355-7 - PubMed
  3. Stat Methods Med Res. 2007 Jun;16(3):219-42 - PubMed
  4. J Bone Joint Surg Am. 2010 Sep 15;92(12):2156-64 - PubMed
  5. Osteoporos Int. 2016 Jun;27(6):2035-45 - PubMed
  6. Pharmacoepidemiol Drug Saf. 2010 Jun;19(6):618-26 - PubMed
  7. Psychol Methods. 2001 Dec;6(4):330-51 - PubMed
  8. Stat Med. 2011 Feb 20;30(4):377-99 - PubMed
  9. J Clin Epidemiol. 2006 Oct;59(10):1092-101 - PubMed
  10. Annu Rev Psychol. 2009;60:549-76 - PubMed
  11. BMJ. 2009 Jun 29;338:b2393 - PubMed
  12. J Clin Epidemiol. 2006 Oct;59(10):1087-91 - PubMed
  13. Lancet. 2007 Oct 20;370(9596):1453-7 - PubMed
  14. Osteoarthritis Cartilage. 2011 Jul;19(7):809-15 - PubMed
  15. Am J Epidemiol. 1995 Dec 15;142(12):1255-64 - PubMed
  16. Psychol Methods. 2002 Jun;7(2):147-77 - PubMed
  17. Addiction. 2010 Mar;105(3):431-7 - PubMed
  18. Biol Psychiatry. 2006 Jun 1;59(11):997-1000 - PubMed

Publication Types

Grant support