Display options
Share it on

EGEMS (Wash DC). 2016 Jun 01;4(1):1217. doi: 10.13063/2327-9214.1217. eCollection 2016.

Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data.

EGEMS (Washington, DC)

Brian C Sauer, Barbara E Jones, Gary Globe, Jianwei Leng, Chao-Chin Lu, Tao He, Chia-Chen Teng, Patrick Sullivan, Qing Zeng

Affiliations

  1. Salt Lake IDEAS Center, Veteran Affairs; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah.
  2. Amgen Inc.
  3. Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah.
  4. Department of Pharmacy Practice, School of Pharmacy, Regis University.
  5. Salt Lake IDEAS Center, Veteran Affairs; Department of Biomedical Informatics, School of Medicine, University of Utah.

PMID: 27376095 PMCID: PMC4909376 DOI: 10.13063/2327-9214.1217

Abstract

INTRODUCTION/OBJECTIVE: Pulmonary function tests (PFTs) are objective estimates of lung function, but are not reliably stored within the Veteran Health Affairs data systems as structured data. The aim of this study was to validate the natural language processing (NLP) tool we developed-which extracts spirometric values and responses to bronchodilator administration-against expert review, and to estimate the number of additional spirometric tests identified beyond the structured data.

METHODS: All patients at seven Veteran Affairs Medical Centers with a diagnostic code for asthma Jan 1, 2006-Dec 31, 2012 were included. Evidence of spirometry with a bronchodilator challenge (BDC) was extracted from structured data as well as clinical documents. NLP's performance was compared against a human reference standard using a random sample of 1,001 documents.

RESULTS: In the validation set NLP demonstrated a precision of 98.9 percent (95 percent confidence intervals (CI): 93.9 percent, 99.7 percent), recall of 97.8 percent (95 percent CI: 92.2 percent, 99.7 percent), and an F-measure of 98.3 percent for the forced vital capacity pre- and post pairs and precision of 100 percent (95 percent CI: 96.6 percent, 100 percent), recall of 100 percent (95 percent CI: 96.6 percent, 100 percent), and an F-measure of 100 percent for the forced expiratory volume in one second pre- and post pairs for bronchodilator administration. Application of the NLP increased the proportion identified with complete bronchodilator challenge by 25 percent.

DISCUSSION/CONCLUSION: This technology can improve identification of PFTs for epidemiologic research. Caution must be taken in assuming that a single domain of clinical data can completely capture the scope of a disease, treatment, or clinical test.

Keywords: Data Reuse; Data Use and Quality; Electronic Health Record (EHR); Informatics; Natural Language Processing; Outcomes Assessment; Pulmonary Disease; asthma; bronchodilator challenge; natural language processing; pulmonary function

References

  1. Am J Respir Crit Care Med. 2013 Dec 1;188(11):1294-302 - PubMed
  2. J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7 - PubMed
  3. Pharmacoepidemiol Drug Saf. 2015 Jan;24(1):86-92 - PubMed
  4. HPB (Oxford). 2010 Dec;12(10):688-95 - PubMed
  5. J Am Med Inform Assoc. 2012 Sep-Oct;19(5):859-66 - PubMed
  6. N Engl J Med. 1994 Jul 7;331(1):25-30 - PubMed
  7. Comput Biol Med. 2014 Oct;53:203-5 - PubMed
  8. AMIA Annu Symp Proc. 2009 Nov 14;2009:411-5 - PubMed
  9. J Am Med Inform Assoc. 2006 Nov-Dec;13(6):691-5 - PubMed
  10. JAMA. 2013 Aug 14;310(6):591-608 - PubMed
  11. Health Aff (Millwood). 2014 Jul;33(7):1203-11 - PubMed
  12. Int J Med Inform. 2009 Apr;78 Suppl 1:S34-42 - PubMed
  13. Eur Respir J. 2005 Jul;26(1):153-61 - PubMed

Publication Types