Display options
Share it on

Gigascience. 2021 Dec 02;10(12). doi: 10.1093/gigascience/giab078.

Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences.

GigaScience

Francisco M Ortuño, Carlos Loucera, Carlos S Casimiro-Soriguer, Jose A Lepe, Pedro Camacho Martinez, Laura Merino Diaz, Adolfo de Salazar, Natalia Chueca, Federico García, Javier Perez-Florido, Joaquin Dopazo

Affiliations

  1. Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain.
  2. Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain.
  3. Unidad Clínica Enfermedades Infecciosas, Microbiología y Medicina Preventiva, Hospital Universitario Virgen del Rocío, 41013 Sevilla, Spain.
  4. Servicio de Microbiología, Hospital Universitario San Cecilio, 18016 Granada, Spain.
  5. FPS/ELIXIR-es, Hospital Virgen del Rocío, Sevilla 42013, Spain.
  6. CIBER de Enfermedades Infecciosas (CIBERINFEC), Hospital Universitario San Cecilio, 18016 Granada, Spain.

PMID: 34865008 PMCID: PMC8643610 DOI: 10.1093/gigascience/giab078

Abstract

BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data.

RESULTS: We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%).

CONCLUSIONS: Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.

© The Author(s) 2021. Published by Oxford University Press GigaScience.

References

  1. BioData Min. 2021 Feb 4;14(1):13 - PubMed
  2. Infect Genet Evol. 2020 Oct;84:104389 - PubMed
  3. Cell. 2021 Mar 4;184(5):1127-1132 - PubMed
  4. PLoS One. 2011;6(6):e21375 - PubMed
  5. J Clin Microbiol. 2021 Mar 19;59(4): - PubMed
  6. PLoS One. 2015 Aug 24;10(8):e0135469 - PubMed
  7. Eur J Hum Genet. 2015 Jul;23(7):975-83 - PubMed
  8. Annu Rev Genomics Hum Genet. 2009;10:387-406 - PubMed
  9. MMWR Morb Mortal Wkly Rep. 2021 Feb 26;70(8):280-282 - PubMed
  10. Genome Biol. 2019 Jan 8;20(1):8 - PubMed
  11. Nat Genet. 2007 Jul;39(7):906-13 - PubMed
  12. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7 - PubMed
  13. Nat Microbiol. 2020 Nov;5(11):1403-1407 - PubMed
  14. Acta Pharmacol Sin. 2020 Sep;41(9):1141-1149 - PubMed
  15. Nat Genet. 2012 Jul 22;44(8):955-9 - PubMed
  16. Nat Genet. 2016 Oct;48(10):1284-1287 - PubMed
  17. PeerJ. 2015 Sep 24;3:e1273 - PubMed
  18. Nature. 2021 Jul;595(7869):707-712 - PubMed
  19. Science. 2021 Apr 9;372(6538): - PubMed
  20. Euro Surveill. 2017 Mar 30;22(13): - PubMed
  21. PLoS One. 2012;7(8):e41882 - PubMed
  22. Am J Hum Genet. 2009 Feb;84(2):210-23 - PubMed
  23. Bioinformatics. 2015 Mar 1;31(5):782-4 - PubMed
  24. Nat Methods. 2012 Mar 04;9(4):357-9 - PubMed
  25. Euro Surveill. 2020 Aug;25(32): - PubMed
  26. Nat Microbiol. 2020 Nov;5(11):1408-1417 - PubMed
  27. Nat Rev Genet. 2010 Jul;11(7):499-511 - PubMed
  28. Viruses. 2020 Aug 15;12(8): - PubMed
  29. Biomed Pharmacother. 2021 Apr;136:111272 - PubMed

Publication Types

Grant support