Display options
Share it on

Genetics. 2021 May 17;218(1). doi: 10.1093/genetics/iyab044.

Robust, flexible, and scalable tests for Hardy-Weinberg equilibrium across diverse ancestries.

Genetics

Alan M Kwong, Thomas W Blackwell, Jonathon LeFaive, Mariza de Andrade, John Barnard, Kathleen C Barnes, John Blangero, Eric Boerwinkle, Esteban G Burchard, Brian E Cade, Daniel I Chasman, Han Chen, Matthew P Conomos, L Adrienne Cupples, Patrick T Ellinor, Celeste Eng, Yan Gao, Xiuqing Guo, Marguerite Ryan Irvin, Tanika N Kelly, Wonji Kim, Charles Kooperberg, Steven A Lubitz, Angel C Y Mak, Ani W Manichaikul, Rasika A Mathias, May E Montasser, Courtney G Montgomery, Solomon Musani, Nicholette D Palmer, Gina M Peloso, Dandi Qiao, Alexander P Reiner, Dan M Roden, M Benjamin Shoemaker, Jennifer A Smith, Nicholas L Smith, Jessica Lasky Su, Hemant K Tiwari, Daniel E Weeks, Scott T Weiss, Laura J Scott, Albert V Smith, Gonçalo R Abecasis, Michael Boehnke, Hyun Min Kang

Affiliations

  1. Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
  2. Mayo Clinic, Rochester, MN 55905, USA.
  3. Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA.
  4. Department of Medicine, Anschultz Medical Campus, University of Colorado, Aurora, CO 80045, USA.
  5. Department of Human Genetics, South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA.
  6. Department of Epidemiology, Human Genetics Center, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
  7. Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
  8. Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, USA.
  9. Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA.
  10. Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA.
  11. Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA.
  12. Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA 02215, USA.
  13. Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
  14. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
  15. Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA.
  16. Framingham Heart Study, Framingham, MA 01702, USA.
  17. Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA.
  18. Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA 02124, USA.
  19. Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216 USA.
  20. Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA.
  21. Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
  22. Department of Epidemiology, Tulane University, New Orleans, LA 70112, USA.
  23. Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
  24. Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  25. Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA.
  26. GeneSTAR Research Program and Division of Allergy and Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
  27. Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
  28. Sarcoidosis Research Unit, Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA.
  29. Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS 39216, USA.
  30. Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.
  31. Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
  32. Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
  33. Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
  34. Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.
  35. Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA 98101, USA.
  36. Department of Veterans Affairs, Seattle Epidemiologic Research and Information Center, Office of Research and Development, Seattle, WA 98108, USA.
  37. Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
  38. Departments of Human Genetics and Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA.

PMID: 33720349 PMCID: PMC8128395 DOI: 10.1093/genetics/iyab044

Abstract

Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.

© The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved. For permissions, please email: [email protected].

Keywords: genotype likelihoods; next-generation sequencing; population structure; principal components analysis

References

  1. Nat Rev Genet. 2011 Jun;12(6):443-51 - PubMed
  2. Nat Genet. 2006 Jan;38(1):86-92 - PubMed
  3. Nature. 2015 Oct 1;526(7571):68-74 - PubMed
  4. Nat Genet. 2006 Aug;38(8):904-9 - PubMed
  5. Nat Methods. 2013 Jan;10(1):5-6 - PubMed
  6. Genome Res. 2020 Feb;30(2):185-194 - PubMed
  7. Genetica. 1995;96(1-2):3-12 - PubMed
  8. Genet Epidemiol. 2011 Nov;35(7):671-8 - PubMed
  9. Genetics. 2019 Nov;213(3):759-770 - PubMed
  10. Science. 1908 Jul 10;28(706):49-50 - PubMed
  11. Bioinformatics. 2016 Mar 1;32(5):713-21 - PubMed
  12. Mol Ecol Resour. 2019 Sep;19(5):1144-1152 - PubMed
  13. Brief Bioinform. 2013 Mar;14(2):144-61 - PubMed
  14. Nat Genet. 2016 Feb;48(2):134-43 - PubMed
  15. Science. 2008 Feb 22;319(5866):1100-4 - PubMed
  16. J Hered. 2015 Jan-Feb;106(1):1-19 - PubMed
  17. Nature. 2010 Sep 2;467(7311):52-8 - PubMed
  18. Genetics. 2008 Nov;180(3):1609-16 - PubMed
  19. Mol Ecol. 2002 Jul;11(7):1157-64 - PubMed
  20. Am J Hum Genet. 1998 Nov;63(5):1531-40 - PubMed
  21. Genet Epidemiol. 2008 Nov;32(7):589-99 - PubMed
  22. Genome Res. 1998 Mar;8(3):186-94 - PubMed
  23. Am J Hum Genet. 2012 Nov 2;91(5):839-48 - PubMed
  24. Bioinformatics. 2011 Aug 1;27(15):2156-8 - PubMed
  25. Am J Hum Genet. 2005 May;76(5):887-93 - PubMed
  26. Stat Appl Genet Mol Biol. 2010;9:Article 13 - PubMed
  27. G3 (Bethesda). 2019 Aug 8;9(8):2447-2461 - PubMed
  28. Nature. 2018 Oct;562(7726):203-209 - PubMed
  29. Genet Epidemiol. 2010 Sep;34(6):591-602 - PubMed
  30. Theor Popul Biol. 2003 May;63(3):221-30 - PubMed
  31. Stat Appl Genet Mol Biol. 2013 Aug;12(4):433-48 - PubMed
  32. Science. 2002 Dec 20;298(5602):2381-5 - PubMed
  33. Am J Hum Genet. 2007 Sep;81(3):559-75 - PubMed
  34. Nat Genet. 2012 May 20;44(6):725-31 - PubMed
  35. Nature. 2021 Feb;590(7845):290-299 - PubMed
  36. Nature. 2015 Feb 12;518(7538):197-206 - PubMed
  37. Am J Hum Genet. 2017 Jul 6;101(1):37-49 - PubMed

Publication Types

Grant support