Display options
Share it on

JMIR Public Health Surveill. 2017 May 03;3(2):e24. doi: 10.2196/publichealth.6396.

TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.

JMIR public health and surveillance

Nestor Alvaro, Yusuke Miyao, Nigel Collier

Affiliations

  1. National Institute of Informatics, Department of Informatics, Tokyo, Japan.
  2. The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, Japan.
  3. Faculty of Modern & Medieval Languages, Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, United Kingdom.

PMID: 28468748 PMCID: PMC5438461 DOI: 10.2196/publichealth.6396

Abstract

BACKGROUND: Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner.

OBJECTIVE: This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed.

METHODS: We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines.

RESULTS: The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes.

CONCLUSIONS: We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP.

©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017.

Keywords: PubMed; Twitter; annotation; corpus; natural language processing; pharmacovigilance; text mining

References

  1. Clin Cornerstone. 1999;2(3):17-31 - PubMed
  2. Curr Drug Targets CNS Neurol Disord. 2002 Apr;1(2):141-7 - PubMed
  3. Bioinformatics. 2003;19 Suppl 1:i180-2 - PubMed
  4. J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8 - PubMed
  5. Stud Health Technol Inform. 2010;160(Pt 2):949-53 - PubMed
  6. J Med Internet Res. 2011 Jan 21;13(1):e6 - PubMed
  7. BMC Bioinformatics. 2011 May 27;12:212 - PubMed
  8. J Biomed Inform. 2011 Dec;44(6):989-96 - PubMed
  9. AMIA Annu Symp Proc. 2011;2011:1464-70 - PubMed
  10. J Biomed Inform. 2012 Oct;45(5):885-92 - PubMed
  11. BMC Bioinformatics. 2012 May 23;13:108 - PubMed
  12. J Med Internet Res. 2013 Apr 17;15(4):e62 - PubMed
  13. J Biomed Inform. 2013 Oct;46(5):914-20 - PubMed
  14. Drug Saf. 2014 May;37(5):343-50 - PubMed
  15. Med 2 0. 2013 Jul 18;2(2):e2 - PubMed
  16. BMC Med Inform Decis Mak. 2014 Oct 23;14:91 - PubMed
  17. JMIR Med Inform. 2014 Jun 27;2(1):e10 - PubMed
  18. J Biomed Inform. 2015 Apr;54:202-12 - PubMed
  19. J Am Med Inform Assoc. 2015 May;22(3):671-81 - PubMed
  20. J Med Internet Res. 2015 Jun 05;17(6):e138 - PubMed
  21. BMC Med Inform Decis Mak. 2015;15 Suppl 2:S6 - PubMed
  22. JMIR Res Protoc. 2015 Jul 02;4(3):e78 - PubMed
  23. BMC Bioinformatics. 2015;16 Suppl 16:S3 - PubMed
  24. BMC Bioinformatics. 2015;16 Suppl 16:S4 - PubMed
  25. J Biomed Inform. 2015 Dec;58:280-287 - PubMed
  26. Am J Med Genet B Neuropsychiatr Genet. 2016 Jun;171(4):546-55 - PubMed
  27. Drug Saf. 2016 Mar;39(3):231-40 - PubMed
  28. J Biomed Semantics. 2016 Feb 09;7:3 - PubMed
  29. JMIR Public Health Surveill. 2015 Jun 26;1(1):e6 - PubMed
  30. Drug Saf. 2016 Jun;39(6):561-75 - PubMed
  31. JMIR Public Health Surveill. 2016 Mar 09;2(1):e8 - PubMed
  32. JMIR Med Inform. 2016 Aug 02;4(3):e24 - PubMed
  33. PLoS One. 2016 Oct 27;11(10):e0162828 - PubMed
  34. SHB12 (2012). 2012 Oct 29;2012:25-32 - PubMed

Publication Types