Display options
Share it on

BMC Med Res Methodol. 2021 Aug 17;21(1):173. doi: 10.1186/s12874-021-01353-3.

Two-stage sampling in the estimation of growth parameters and percentile norms: sample weights versus auxiliary variable estimation.

BMC medical research methodology

George Vamvakas, Courtenay Norbury, Andrew Pickles

Affiliations

  1. Department of Biostatistics and Health Informatics, Institute of Psychology, Psychiatry and Neuroscience, Kings College London, London, UK. [email protected].
  2. Psychology and Language Sciences, University College London, London, UK.
  3. Department of Biostatistics and Health Informatics, Institute of Psychology, Psychiatry and Neuroscience, Kings College London, London, UK.

PMID: 34404347 PMCID: PMC8369688 DOI: 10.1186/s12874-021-01353-3

Abstract

BACKGROUND: The use of auxiliary variables with maximum likelihood parameter estimation for surveys that miss data by design is not a widespread approach, despite its documented improved efficiency over traditional approaches that deploy sampling weights. Although efficiency gains from the use of Normally distributed auxiliary variables in a model have been recorded in the literature, little is known about the effects of non-Normal auxiliary variables in the parameter estimation.

METHODS: We simulate growth data to mimic SCALES, a two-stage survey of language development with a screening phase (stage one) for which data are observed for the whole sample and an intensive assessments phase (stage two), for which data are observed for a sub-sample, selected using stratified random sampling. In the simulation, we allow a fully observed Poisson distributed stratification criterion to be correlated with the partially observed model responses and develop five generalised structural equation growth models that host the auxiliary information from this criterion. We compare these models with each other and with a weighted growth model in terms of bias, efficiency, and coverage. We finally apply our best performing model to SCALES data and show how to obtain growth parameters and population norms.

RESULTS: Parameter estimation from a model that incorporates a non-Normal auxiliary variable is unbiased and more efficient than its weighted counterpart. The auxiliary variable method is capable of producing efficient population percentile norms and velocities.

CONCLUSIONS: The deployment of a fully observed variable that dominates the selection of the sample and correlates strongly with the incomplete variable of interest appears beneficial for the estimation process.

© 2021. The Author(s).

Keywords: Auxiliary variable; Missing data; Percentile norms; Population norms; Two-stage design; Weights

References

  1. Stat Med. 1992 Jul;11(10):1305-19 - PubMed
  2. Stat Med. 2011 Feb 20;30(4):377-99 - PubMed
  3. Arch Dis Child. 1952 Feb;27(131):10-33 - PubMed
  4. J Child Psychol Psychiatry. 2016 Nov;57(11):1247-1257 - PubMed
  5. JAMA. 1985 Jan 25;253(4):530-4 - PubMed
  6. J Speech Lang Hear Res. 1997 Dec;40(6):1245-60 - PubMed
  7. Stat Med. 1998 Feb 28;17(4):407-29 - PubMed
  8. Clin Chem. 2001 Oct;47(10):1804-10 - PubMed
  9. Stat Methods Med Res. 1996 Sep;5(3):239-61 - PubMed
  10. Clin Exp Immunol. 1993 Feb;91(2):337-41 - PubMed
  11. Stat Med. 2019 May 20;38(11):2074-2102 - PubMed
  12. PLoS One. 2019 Mar 7;14(3):e0213492 - PubMed
  13. J Speech Hear Disord. 1986 May;51(2):98-110 - PubMed
  14. J Child Psychol Psychiatry. 2017 Oct;58(10):1092-1105 - PubMed
  15. Stat Methods Med Res. 1995 Mar;4(1):73-89 - PubMed
  16. Psychol Methods. 2001 Dec;6(4):330-51 - PubMed
  17. Vital Health Stat 11. 2002 May;(246):1-190 - PubMed

MeSH terms

Publication Types

Grant support