Display options
Share it on

Bioinformatics. 2021 Jun 28; doi: 10.1093/bioinformatics/btab467. Epub 2021 Jun 28.

EpitopeVec: Linear Epitope Prediction Using Deep Protein Sequence Embeddings.

Bioinformatics (Oxford, England)

Akash Bahai, Ehsaneddin Asgari, Mohammad R K Mofrad, Andreas Kloetgen, Alice C McHardy

Affiliations

  1. Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124 Braunschweig, Germany.
  2. Braunschweig Integrated Center of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig.
  3. Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, 94720, USA.
  4. Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA 94720, USA.

PMID: 34180989 PMCID: PMC8652027 DOI: 10.1093/bioinformatics/btab467

Abstract

MOTIVATION: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents, and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51-53.

RESULTS: We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve (AUC). As the predictive performance depended on the species origin of the respective antigens (viral, bacterial, eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance.

AVAILABLITY: The software is available at https://github.com/hzi-bifo/epitope-prediction.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

© The Author(s) 2021. Published by Oxford University Press.

References

  1. J Biomed Inform. 2015 Feb;53:405-14 - PubMed
  2. Sci Rep. 2019 Mar 5;9(1):3577 - PubMed
  3. Vaccine. 1999 Sep;18(3-4):311-4 - PubMed
  4. NAR Genom Bioinform. 2020 Jun;2(2):lqaa015 - PubMed
  5. Proc Natl Acad Sci U S A. 1995 Sep 12;92(19):8700-4 - PubMed
  6. Nucleic Acids Res. 2000 Jan 1;28(1):45-8 - PubMed
  7. Int J Pept Res Ther. 2020;26(2):1155-1163 - PubMed
  8. PLoS One. 2013 May 07;8(5):e62216 - PubMed
  9. J Mol Recognit. 2003 Jan-Feb;16(1):20-2 - PubMed
  10. J Mol Biol. 1976 Jun 14;104(1):59-107 - PubMed
  11. Chem Rev. 2020 Mar 25;120(6):3210-3229 - PubMed
  12. Curr Top Med Chem. 2019;19(2):105-115 - PubMed
  13. Immunome Res. 2006 Apr 24;2:2 - PubMed
  14. Pac Symp Biocomput. 2002;:564-75 - PubMed
  15. Bioinformatics. 2017 Jan 1;33(1):42-48 - PubMed
  16. Protein Eng Des Sel. 2009 Mar;22(3):113-20 - PubMed
  17. Nat Biotechnol. 2011 May 15;29(7):644-52 - PubMed
  18. Proteins. 2020 Mar;88(3):397-413 - PubMed
  19. Proteins. 2006 Oct 1;65(1):40-8 - PubMed
  20. Curr Pharm Des. 2010;16(28):3149-57 - PubMed
  21. Methods Enzymol. 1991;203:176-201 - PubMed
  22. BioData Min. 2020 Apr 17;13:1 - PubMed
  23. J Immunol Res. 2017;2017:2680160 - PubMed
  24. PLoS One. 2013 Nov 11;8(11):e78605 - PubMed
  25. J Mol Recognit. 2008 Jul-Aug;21(4):243-55 - PubMed
  26. Rev Med Virol. 2009 Mar;19(2):77-96 - PubMed
  27. Bioinformatics. 2018 Sep 1;34(17):i773-i780 - PubMed
  28. Genome Biol. 2019 Nov 19;20(1):244 - PubMed
  29. Nat Biotechnol. 2015 Aug;33(8):831-8 - PubMed
  30. PLoS One. 2012;7(9):e45152 - PubMed
  31. BMC Genomics. 2005 May 29;6:79 - PubMed
  32. Nucleic Acids Res. 2010 Jan;38(Database issue):D854-62 - PubMed
  33. Bioinformatics. 2015 Sep 15;31(18):2939-46 - PubMed
  34. Bioinformatics. 2018 Jul 1;34(13):i32-i42 - PubMed
  35. Bioinformatics. 2021 May 1;37(4):448-455 - PubMed
  36. Methods Mol Biol. 2007;409:v-vi - PubMed
  37. Biol Direct. 2013 Oct 30;8:27 - PubMed
  38. BioData Min. 2015 Apr 29;8:14 - PubMed
  39. Nature. 1986 Aug 21-27;322(6081):747-8 - PubMed
  40. PLoS One. 2015 Nov 10;10(11):e0141287 - PubMed
  41. Infect Dis Poverty. 2020 Jul 10;9(1):88 - PubMed
  42. Comput Biol Chem. 2016 Jun;62:82-95 - PubMed
  43. Bioinformatics. 2008 Apr 1;24(7):924-31 - PubMed
  44. Nucleic Acids Res. 2017 Jul 3;45(W1):W24-W29 - PubMed
  45. Genome Biol. 2014 Mar 03;15(3):R46 - PubMed
  46. BioDrugs. 2010 Feb 1;24(1):1-8 - PubMed
  47. Cell. 2013 Jan 17;152(1-2):327-39 - PubMed
  48. Protein Sci. 2005 Jan;14(1):246-8 - PubMed
  49. Amino Acids. 2007 Sep;33(3):423-8 - PubMed
  50. J Virol. 1985 Sep;55(3):836-9 - PubMed
  51. Cell Host Microbe. 2020 Apr 8;27(4):671-680.e2 - PubMed
  52. Front Immunol. 2018 Jul 27;9:1695 - PubMed
  53. J Immunol Res. 2016;2016:6760830 - PubMed
  54. BMC Bioinformatics. 2013 Mar 09;14:90 - PubMed
  55. Int J Mol Sci. 2021 Mar 22;22(6): - PubMed
  56. Bioinformatics. 2010 Mar 1;26(5):680-2 - PubMed
  57. J Mol Graph. 1993 Sep;11(3):204-10, 191-2 - PubMed
  58. Sci Rep. 2018 Oct 8;8(1):14904 - PubMed
  59. FEBS Lett. 1990 Dec 10;276(1-2):172-4 - PubMed
  60. Curr Protein Pept Sci. 2003 Aug;4(4):299-308 - PubMed

Publication Types