Display options
Share it on

JMIR Med Inform. 2019 Apr 21;7(2):e12109. doi: 10.2196/12109.

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports.

JMIR medical informatics

Sunyang Fu, Lester Y Leung, Yanshan Wang, Anne-Olivia Raulli, David F Kallmes, Kristin A Kinsman, Kristoff B Nelson, Michael S Clark, Patrick H Luetmer, Paul R Kingsbury, David M Kent, Hongfang Liu

Affiliations

  1. Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
  2. Department of Neurology, Tufts Medical Center, Boston, MA, United States.
  3. Department of Radiology, Mayo Clinic, Rochester, MN, United States.
  4. Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States.

PMID: 31066686 PMCID: PMC6524454 DOI: 10.2196/12109

Abstract

BACKGROUND: Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports.

OBJECTIVE: This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center.

METHODS: Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing.

RESULTS: Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively.

CONCLUSIONS: We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

©Sunyang Fu, Lester Y Leung, Yanshan Wang, Anne-Olivia Raulli, David F Kallmes, Kristin A Kinsman, Kristoff B Nelson, Michael S Clark, Patrick H Luetmer, Paul R Kingsbury, David M Kent, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.04.2019.

Keywords: electronic health records; natural language processing; neuroimaging

References

  1. J Biomed Inform. 2002 Aug;35(4):236-46 - PubMed
  2. Stroke. 1992 Oct;23(10):1434-8 - PubMed
  3. J Neurol Neurosurg Psychiatry. 2005 Jun;76(6):793-6 - PubMed
  4. Lancet Neurol. 2007 Jul;6(7):611-9 - PubMed
  5. Stroke. 2008 May;39(5):1414-20 - PubMed
  6. Psychometrika. 1947 Jun;12(2):153-7 - PubMed
  7. Bioinformatics. 2010 May 1;26(9):1205-10 - PubMed
  8. Stroke. 2011 Jan;42(1):227-76 - PubMed
  9. BMC Med Genomics. 2011 Jan 26;4:13 - PubMed
  10. Summit Transl Bioinform. 2009 Mar 01;2009:1-32 - PubMed
  11. J Am Med Inform Assoc. 2011 Jul-Aug;18(4):387-91 - PubMed
  12. JAMA. 2011 Aug 24;306(8):848-55 - PubMed
  13. Stroke. 1990 Jun;21(6):890-4 - PubMed
  14. BMC Med. 2014 Jul 09;12:119 - PubMed
  15. Ann Neurol. 2014 Dec;76(6):899-904 - PubMed
  16. Stroke. 2014 Nov;45(11):3461-71 - PubMed
  17. Stroke. 2015 Apr;46(4):1123-6 - PubMed
  18. J Biomed Inform. 2018 Jan;77:34-49 - PubMed
  19. J Biomed Inform. 2018 Nov;87:12-20 - PubMed
  20. Ann Emerg Med. 1996 Mar;27(3):305-8 - PubMed
  21. Neural Comput. 1998 Sep 15;10(7):1895-1923 - PubMed

Publication Types

Grant support