Display options
Share it on

Comput Ind Eng. 2021 Nov;161:107666. doi: 10.1016/j.cie.2021.107666. Epub 2021 Sep 08.

COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus.

Computers & industrial engineering

Hilal Arslan

Affiliations

  1. Department of Software Engineering, Ankara Y?ld?r?m Beyaz?t University, Turkey.

PMID: 34511707 PMCID: PMC8423779 DOI: 10.1016/j.cie.2021.107666

Abstract

This paper proposes an efficient and accurate method to predict coronavirus disease 19 (COVID-19) based on the genome similarity of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and a bat SARS-CoV-like coronavirus. We introduce similarity features to distinguish COVID-19 from other human coronaviruses by comparing human coronaviruses with a bat SARS-CoV-like coronavirus. In the proposed method each human coronavirus sequence is assigned to three similarity scores considering nucleotide similarities and mutations that lead to the strong absence of cytosine and guanine nucleotides. Next the proposed features are integrated with CpG island features of the genome sequences to improve COVID-19 prediction. Thus, each genome sequence is represented by five real numbers. We exhibit the effectiveness of the proposed features using six machine learning classifiers on a dataset including the genome sequences of human coronaviruses similar to SARS-CoV-2. The performances of the machine learning classifiers are close to each other and k-nearest neighbor classifier with similarity features achieves the best results with an accuracy of 99.2%. Moreover, k-nearest neighbor classifier with the integration of CpG based and similarity features has an admirable performance and achieves an accuracy of 99.8%. Experimental results demonstrate that similarity features remarkably decrease the number of false negatives and significantly improve the overall performance. The superiority of the proposed method is also highlighted by comparing with the state-of-the-art studies detecting COVID-19 from genome sequences.

© 2021 Elsevier Ltd. All rights reserved.

Keywords: Bat coronavirus; Covid-19; CpG islands; Feature extraction; Human coronavirus; Machine learning methods; SARS-CoV-2; Similarity Feature

References

  1. Soft comput. 2020 Aug 28;:1-9 - PubMed
  2. N Engl J Med. 2020 Mar 5;382(10):929-936 - PubMed
  3. BMC Bioinformatics. 2009 Jul 10;10:213 - PubMed
  4. IEEE Rev Biomed Eng. 2021;14:4-15 - PubMed
  5. Nat Rev Microbiol. 2009 Jun;7(6):439-50 - PubMed
  6. Sci Rep. 2020 Jul 23;10(1):12331 - PubMed
  7. Infect Genet Evol. 2020 Aug;82:104285 - PubMed
  8. SN Comput Sci. 2021;2(1):11 - PubMed
  9. Brief Bioinform. 2021 Mar 22;22(2):1197-1205 - PubMed
  10. BioData Min. 2021 Feb 4;14(1):13 - PubMed
  11. Biomed Signal Process Control. 2021 Aug;69:102862 - PubMed
  12. Nature. 2020 Mar;579(7798):270-273 - PubMed
  13. Lancet. 2020 Feb 22;395(10224):565-574 - PubMed
  14. J Gen Intern Med. 2020 May;35(5):1545-1549 - PubMed
  15. Travel Med Infect Dis. 2021 Jan-Feb;39:101911 - PubMed
  16. J Med Virol. 2020 Jun;92(6):602-611 - PubMed
  17. Chaos Solitons Fractals. 2021 Jan;142:110338 - PubMed
  18. PeerJ. 2020 Sep 28;8:e10083 - PubMed
  19. Nucleic Acids Res. 1990 Apr 25;18(8):2163-70 - PubMed
  20. Comput Methods Programs Biomed. 2020 Aug;192:105400 - PubMed
  21. JAMA. 2020 Mar 17;323(11):1061-1069 - PubMed
  22. Big Data. 2019 Dec;7(4):221-248 - PubMed
  23. Zool Res. 2020 Nov 18;41(6):705-708 - PubMed
  24. Sci Rep. 2021 May 10;11(1):9887 - PubMed
  25. Chaos Solitons Fractals. 2020 Nov;140:110118 - PubMed
  26. Sci Rep. 2021 Jan 13;11(1):947 - PubMed
  27. Pattern Anal Appl. 2021 Jan 22;:1-14 - PubMed
  28. J Med Virol. 2020 May;92(5):501-511 - PubMed
  29. JAMA. 2020 Feb 25;323(8):707-708 - PubMed
  30. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759 - PubMed
  31. Radiology. 2019 Apr;291(1):196-202 - PubMed
  32. NPJ Digit Med. 2021 Jan 4;4(1):3 - PubMed
  33. PLoS One. 2020 Apr 24;15(4):e0232391 - PubMed
  34. Inform Med Unlocked. 2020;20:100427 - PubMed
  35. In Vivo. 2020 Jun;34(3 Suppl):1613-1617 - PubMed
  36. Radiology. 2020 Aug;296(2):E32-E40 - PubMed
  37. Neural Comput. 2003 Jul;15(7):1667-89 - PubMed
  38. ACS Nano. 2020 Apr 28;14(4):3822-3835 - PubMed
  39. Euro Surveill. 2020 Aug;25(32): - PubMed
  40. Nature. 2020 Mar;579(7798):265-269 - PubMed

Publication Types