Display options
Share it on

J Comput Graph Stat. 2017;26(1):1-13. doi: 10.1080/10618600.2016.1154063. Epub 2017 Feb 16.

Regression Models For Multivariate Count Data.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

Yiwen Zhang, Hua Zhou, Jin Zhou, Wei Sun

Affiliations

  1. Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203.
  2. Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095-1772.
  3. Division of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ 85721-0066.
  4. Program in Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, Seattle, WA 98109.

PMID: 28348500 PMCID: PMC5365157 DOI: 10.1080/10618600.2016.1154063

Abstract

Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

Keywords: Dirichlet-multinomial; analysis of deviance; categorical data analysis; generalized Dirichlet-multinomial; iteratively reweighted Poisson regression (IRPR); negative multinomial; reduced rank GLM; regularization

References

  1. Nat Rev Genet. 2009 Jan;10(1):57-63 - PubMed
  2. Bioinformatics. 2010 Jan 1;26(1):139-40 - PubMed
  3. J Stat Softw. 2010;33(1):1-22 - PubMed
  4. Ann Appl Stat. 2013 Mar 1;7(1):null - PubMed
  5. Genome Biol. 2010;11(10):R106 - PubMed
  6. Genomics. 2002 Jul;80(1):5-7 - PubMed
  7. J Am Stat Assoc. 2015;110(511):975-986 - PubMed
  8. J Comput Graph Stat. 2010 Sep 1;19(3):645-665 - PubMed
  9. Comput Stat Data Anal. 2012 Dec;56(12):3909-3920 - PubMed
  10. Stat Sci. 2010 Aug 1;25(3):311-324 - PubMed
  11. Nature. 2010 Apr 1;464(7289):773-7 - PubMed

Publication Types

Grant support