Display options
Share it on

Database (Oxford). 2019 Jan 01;2019. doi: 10.1093/database/bay147.

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Database : the journal of biological databases and curation

Rezarta Islamaj Dogan, Sun Kim, Andrew Chatr-Aryamontri, Chih-Hsuan Wei, Donald C Comeau, Rui Antunes, Sérgio Matos, Qingyu Chen, Aparna Elangovan, Nagesh C Panyam, Karin Verspoor, Hongfang Liu, Yanshan Wang, Zhuang Liu, Berna Altinel, Zehra Melce Hüsünbeyi, Arzucan Özgür, Aris Fergadis, Chen-Kai Wang, Hong-Jie Dai, Tung Tran, Ramakanth Kavuluru, Ling Luo, Albert Steppi, Jinfeng Zhang, Jinchan Qu, Zhiyong Lu

Affiliations

  1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
  2. Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada.
  3. Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.
  4. School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia.
  5. Department of Health Science Research, Mayo Clinic, Rochester, MN, USA.
  6. School of Computer Science and Technology, Dalian University of Technology, Dalian, China.
  7. Department of Computer Engineering, Marmara University, Istanbul, Turkey.
  8. Department of Computer Engineering, Bogaziçi University, Istanbul, Turkey.
  9. School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece.
  10. Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan.
  11. Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan.
  12. Department of Computer Science, University of Kentucky, Lexington, KY, USA.
  13. Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA.
  14. College of Computer Science and Technology, Dalian University of Technology, Dalian, China.
  15. Department of Statistics, Florida State University, Florida, USA.

PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

References

  1. Bioinformatics. 2016 Jun 15;32(12):1907-10 - PubMed
  2. N Engl J Med. 2015 Feb 26;372(9):793-5 - PubMed
  3. Database (Oxford). 2013 Jan 17;2013:bas056 - PubMed
  4. Database (Oxford). 2016 Aug 23;2016: - PubMed
  5. IEEE Trans Pattern Anal Mach Intell. 2009 Apr;31(4):721-35 - PubMed
  6. PLoS One. 2012;7(6):e38460 - PubMed
  7. Database (Oxford). 2016 Aug 10;2016: - PubMed
  8. Biomed Res Int. 2015;2015:918710 - PubMed
  9. Database (Oxford). 2012 Nov 17;2012:bas043 - PubMed
  10. Database (Oxford). 2013 Sep 18;2013:bat064 - PubMed
  11. Database (Oxford). 2016 Sep 01;2016: - PubMed
  12. Bioinformatics. 2017 Nov 1;33(21):3454-3460 - PubMed
  13. Curr Opin Genet Dev. 2013 Dec;23(6):611-21 - PubMed
  14. BMC Bioinformatics. 2011 Oct 03;12 Suppl 8:S1 - PubMed
  15. Bioinformatics. 2004 Mar 1;20(4):557-68 - PubMed
  16. Database (Oxford). 2014 Jun 30;2014: - PubMed
  17. Database (Oxford). 2016 Sep 01;2016: - PubMed
  18. Database (Oxford). 2016 May 09;2016: - PubMed
  19. Bioinformatics. 2012 Feb 15;28(4):597-8 - PubMed
  20. Bioinformatics. 2013 Jun 1;29(11):1433-9 - PubMed
  21. PLoS One. 2016 Apr 13;11(4):e0152725 - PubMed
  22. J Biomed Semantics. 2018 Jan 30;9(1):7 - PubMed
  23. Hum Mutat. 2008 Mar;29(3):333-44 - PubMed
  24. Adv Exp Med Biol. 2016;939:139-166 - PubMed
  25. Nucleic Acids Res. 2014 Jan;42(Database issue):D358-63 - PubMed
  26. Bioinformatics. 2018 Jan 1;34(1):80-87 - PubMed
  27. Database (Oxford). 2014 Jun 09;2014: - PubMed
  28. Genome Biol. 2008;9 Suppl 2:S5 - PubMed
  29. J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2 - PubMed
  30. Database (Oxford). 2017 Jan 10;2017: - PubMed
  31. Nucleic Acids Res. 2017 Jan 4;45(D1):D369-D379 - PubMed
  32. J Biomed Inform. 2013 Oct;46(5):914-20 - PubMed
  33. Bioinformatics. 2017 Jun 15;33(12):1852-1858 - PubMed
  34. BMC Bioinformatics. 2011 Oct 03;12 Suppl 8:S4 - PubMed
  35. Database (Oxford). 2012 Apr 18;2012:bas020 - PubMed
  36. Database (Oxford). 2014 Jul 22;2014: - PubMed
  37. Genome Biol. 2008;9 Suppl 2:S1 - PubMed
  38. BMC Bioinformatics. 2005;6 Suppl 1:S1 - PubMed
  39. Database (Oxford). 2018 Jan 1;2018: - PubMed
  40. PLoS Comput Biol. 2016 Nov 30;12(11):e1005017 - PubMed
  41. Genome Biol. 2008;9 Suppl 2:S4 - PubMed
  42. BMC Bioinformatics. 2015 Apr 30;16:138 - PubMed
  43. Bioinformatics. 2007 Jul 15;23(14):1862-5 - PubMed
  44. Database (Oxford). 2016 Mar 19;2016: - PubMed
  45. BioData Min. 2016 Dec 19;9:41 - PubMed

MeSH terms

Publication Types

Grant support