Display options
Share it on

Bioinformatics. 2021 Jan 18; doi: 10.1093/bioinformatics/btab028. Epub 2021 Jan 18.

SeeCiTe: a method to assess CNV calls from SNP arrays using trio data.

Bioinformatics (Oxford, England)

Ksenia Lavrichenko, Øyvind Helgeland, Pål R Njølstad, Inge Jonassen, Stefan Johansson

Affiliations

  1. Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
  2. Department of Clinical Science, University of Bergen, Bergen, Norway.
  3. Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.
  4. Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen, Norway.
  5. Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway.

PMID: 33459766 PMCID: PMC8317106 DOI: 10.1093/bioinformatics/btab028

Abstract

MOTIVATION: Single nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling.

RESULTS: We developed SeeCiTe (Seeing Cnvs in Trios), a novel CNV quality control tool that post-processes output from current CNV calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father, and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge it is the first tool that utilizes probe-level CNV data in trios (and singletons) to systematically highlight potential artefacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies.

AVAILABILITY AND IMPLEMENTATION: The software is implemented in R with the source code freely available at https://github.com/aksenia/SeeCiTe.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

© The Author(s) 2021. Published by Oxford University Press.

References

  1. BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):383 - PubMed
  2. Nucleic Acids Res. 2007;35(6):2013-25 - PubMed
  3. Genome Res. 2006 Sep;16(9):1136-48 - PubMed
  4. Nature. 2015 Oct 1;526(7571):75-81 - PubMed
  5. BMC Bioinformatics. 2012 Dec 12;13:330 - PubMed
  6. PLoS One. 2015 Jul 21;10(7):e0133465 - PubMed
  7. Nat Rev Genet. 2006 Jul;7(7):552-64 - PubMed
  8. BMC Bioinformatics. 2014 Feb 21;15:50 - PubMed
  9. Annu Rev Genet. 2011;45:203-26 - PubMed
  10. Genome Res. 2007 Nov;17(11):1665-74 - PubMed
  11. Nat Biotechnol. 2011 May 08;29(6):512-20 - PubMed
  12. Bioinformatics. 2017 Jan 1;33(1):145-147 - PubMed
  13. Nat Commun. 2019 Apr 16;10(1):1784 - PubMed
  14. Nat Commun. 2019 Oct 1;10(1):4448 - PubMed
  15. Nature. 2003 Dec 18;426(6968):789-96 - PubMed
  16. Bioinformatics. 2016 Nov 1;32(21):3298-3305 - PubMed
  17. Nat Rev Genet. 2015 Mar;16(3):172-83 - PubMed
  18. Nat Rev Genet. 2006 Feb;7(2):85-97 - PubMed
  19. Biol Psychiatry. 2017 Jul 15;82(2):103-110 - PubMed
  20. Int J Epidemiol. 2016 Apr;45(2):382-8 - PubMed
  21. BMC Genomics. 2016 Jan 16;17:64 - PubMed
  22. PLoS One. 2018 Apr 27;13(4):e0196226 - PubMed
  23. Adv Genomics Genet. 2017;7:1-9 - PubMed
  24. Vet World. 2018 Apr;11(4):535-541 - PubMed
  25. BMC Bioinformatics. 2011 May 31;12:220 - PubMed

Publication Types