Display options
Share it on

Comput Intell. 2011 Nov;27(4):681-701. doi: 10.1111/j.1467-8640.2011.00405.x.

HIGH-PRECISION BIOLOGICAL EVENT EXTRACTION: EFFECTS OF SYSTEM AND OF DATA.

Computational intelligence

K Bretonnel Cohen, Karin Verspoor, Helen L Johnson, Chris Roeder, Philip V Ogren, William A Baumgartner, Elizabeth White, Hannah Tipney, Lawrence Hunter

Affiliations

  1. Center for Computational Pharmacology, University of Colorado Denver School of Medicine, Aurora, CO, USA.

PMID: 25937701 PMCID: PMC4414063 DOI: 10.1111/j.1467-8640.2011.00405.x

Abstract

We approached the problems of event detection, argument identification, and negation and speculation detection in the BioNLP'09 information extraction challenge through concept recognition and analysis. Our methodology involved using the OpenDMAP semantic parser with manually written rules. The original OpenDMAP system was updated for this challenge with a broad ontology defined for the events of interest, new linguistic patterns for those events, and specialized coordination handling. We achieved state-of-the-art precision for two of the three tasks, scoring the highest of 24 teams at precision of 71.81 on Task 1 and the highest of 6 teams at precision of 70.97 on Task 2. We provide a detailed analysis of the training data and show that a number of trigger words were ambiguous as to event type, even when their arguments are constrained by semantic class. The data is also shown to have a number of missing annotations. Analysis of a sampling of the comparatively small number of false positives returned by our system shows that major causes of this type of error were failing to recognize second themes in two-theme events, failing to recognize events when they were the arguments to other events, failure to recognize nontheme arguments, and sentence segmentation errors. We show that specifically handling coordination had a small but important impact on the overall performance of the system. The OpenDMAP system and the rule set are available at http://bionlp.sourceforge.net.

Keywords: BioNLP; conceptual analysis; event recognition; natural language processing; text mining

References

  1. Genome Inform. 2001;12:123-34 - PubMed
  2. Genome Biol. 2005;6(2):R21 - PubMed
  3. PLoS Comput Biol. 2009 Mar;5(3):e1000215 - PubMed
  4. Pac Symp Biocomput. 2001;:408-19 - PubMed
  5. BMC Bioinformatics. 2008 Jan 31;9:78 - PubMed
  6. Genome Res. 2001 Aug;11(8):1425-33 - PubMed
  7. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D431-3 - PubMed
  8. Bioinformatics. 2004 Mar 22;20(5):604-11 - PubMed
  9. PLoS One. 2008 Sep 09;3(9):e3158 - PubMed
  10. J Biomed Inform. 2003 Dec;36(6):478-500 - PubMed
  11. Pac Symp Biocomput. 2008;:556-67 - PubMed
  12. Genome Biol. 2005;6(5):R44 - PubMed
  13. Bioinformatics. 2009 Feb 1;25(3):394-400 - PubMed
  14. BMC Bioinformatics. 2009 Jun 15;10:183 - PubMed
  15. Nat Genet. 2000 May;25(1):25-9 - PubMed

Publication Types

Grant support