Display options
Share it on

J Am Stat Assoc. 2020;115(531):1079-1091. doi: 10.1080/01621459.2019.1660170. Epub 2019 Oct 16.

Genetic Variant Set-Based Tests Using the Generalized Berk-Jones Statistic with Application to a Genome-Wide Association Study of Breast Cancer.

Journal of the American Statistical Association

Ryan Sun, Xihong Lin

Affiliations

  1. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115.
  2. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115; Department of Statistics, Harvard University, Cambridge, MA 02138.

PMID: 33041403 PMCID: PMC7539682 DOI: 10.1080/01621459.2019.1660170

Abstract

Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases like breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk-Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis.

Keywords: Breast cancer; FGFR2 gene; Gene-level test; Generalized higher criticism; Sparse alternative

References

  1. Am J Hum Genet. 2012 Jan 13;90(1):7-24 - PubMed
  2. PLoS Biol. 2008 May 6;6(5):e108 - PubMed
  3. Nature. 2002 Aug 1;418(6897):544-8 - PubMed
  4. Cell Rep. 2012 Sep 27;2(3):580-90 - PubMed
  5. Bioinformatics. 2011 Aug 15;27(16):2304-5 - PubMed
  6. Nat Rev Genet. 2017 Feb;18(2):117-127 - PubMed
  7. Cell. 2007 Jun 15;129(6):1065-79 - PubMed
  8. Nature. 2015 Oct 1;526(7571):68-74 - PubMed
  9. Am J Hum Genet. 2014 Jul 3;95(1):5-23 - PubMed
  10. Nat Genet. 2006 Aug;38(8):904-9 - PubMed
  11. Am J Hum Genet. 2008 Sep;83(3):311-21 - PubMed
  12. J Am Stat Assoc. 2017;112(517):64-76 - PubMed
  13. Bioinformatics. 2011 Jan 1;27(1):95-102 - PubMed
  14. Nat Genet. 2007 Jul;39(7):870-4 - PubMed
  15. Carcinogenesis. 2008 Dec;29(12):2341-6 - PubMed
  16. Am J Hum Genet. 2011 Jul 15;89(1):82-93 - PubMed
  17. Nature. 2009 Oct 8;461(7265):747-53 - PubMed
  18. Am J Hum Genet. 2007 Dec;81(6):1158-68 - PubMed
  19. Am J Hum Genet. 2010 Jun 11;86(6):929-42 - PubMed

Publication Types

Grant support