Display options
Share it on

F1000Res. 2016 Apr 13;5:673. doi: 10.12688/f1000research.8290.1. eCollection 2016.

dbVar structural variant cluster set for data analysis and variant comparison.

F1000Research

Lon Phan, Jeffrey Hsu, Le Quang Minh Tri, Michaela Willi, Tamer Mansour, Yan Kai, John Garner, John Lopez, Ben Busby

Affiliations

  1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
  2. Cleveland Clinic Lerner Research Institute, Cleveland, OH, USA.
  3. Department of Biotechnology, Ho Chi Minh City International University, Ho Chi Minh, Vietnam.
  4. Laboratory of Genetics and Physiology, National Institute of Diabetes, Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MA, USA; Division of Bioinformatics, Biocenter, Medical University Innsbruck, Innsbruck, Austria.
  5. Lab for Data Intensive Biology, Department of Population Health and Reproduction, University of California, Davis, CA, USA; Department of Clinical Pathology, University of Mansoura, Mansoura, Egypt.
  6. Cancer Epigenetics Laboratory, Department of Anatomy and Regenerative Biology, The George Washington University, Washington, DC, USA; Department of Physics, The George Washington University, Washington, DC, USA.

PMID: 28357035 PMCID: PMC5345777 DOI: 10.12688/f1000research.8290.1

Abstract

dbVar houses over 3 million submitted structural variants (SSV) from 120 human studies including copy number variations (CNV), insertions, deletions, inversions, translocations, and complex chromosomal rearrangements. Users can submit multiple SSVs to dbVAR  that are presumably identical, but were ascertained by different platforms and samples,  to calculate whether the variant is rare or common in the population and allow for cross validation. However, because SSV genomic location reporting can vary - including fuzzy locations where the start and/or end points are not precisely known - analysis, comparison, annotation, and reporting of SSVs across studies can be difficult. This project was initiated by the Structural Variant Comparison Group for the purpose of generating a non-redundant set of genomic regions defined by counts of concordance for all human SSVs placed on RefSeq assembly GRCh38 (RefSeq accession GCF_000001405.26). We intend that the availability of these regions, called structural variant clusters (SVCs), will facilitate the analysis, annotation, and exchange of SV data and allow for simplified display in genomic sequence viewers for improved variant interpretation. Sets of SVCs were generated by variant type for each of the 120 studies as well as for a combined set across all studies. Starting from 3.64 million SSVs, 2.5 million and 3.4 million non-redundant SVCs with count >=1 were generated by variant type for each study and across all studies, respectively. In addition, we have developed utilities for annotating, searching, and filtering SVC data in GVF format for computing summary statistics, exporting data for genomic viewers, and annotating the SVC using external data sources.

Keywords: Education; GVF; Genome Annotation; Genomics; NCBI; Open-Source; Software; Structural Variation Cluster; dbVar

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Publication Types