Display options
Share it on

JMIR Med Inform. 2016 Nov 30;4(4):e40. doi: 10.2196/medinform.6373.

Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations.

JMIR medical informatics

Jinying Chen, Jiaping Zheng, Hong Yu

Affiliations

  1. Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States.
  2. School of Computer Science, University of Massachusetts, Amherst, MA, United States.
  3. Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States.

PMID: 27903489 PMCID: PMC5156821 DOI: 10.2196/medinform.6373

Abstract

BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients' notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care.

OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients.

METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians' agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems.

RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen's kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P<.001). Rich learning features contributed to FOCUS's performance substantially.

CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care.

©Jinying Chen, Jiaping Zheng, Hong Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 30.11.2016.

Keywords: electronic health records; information extraction; learning to rank; natural language processing; supervised learning

Conflict of interest statement

Conflicts of Interest: None declared.

References

  1. Proc Conf. 2016 Jun;2016:473-482 - PubMed
  2. J Biomed Inform. 2006 Dec;39(6):668-79 - PubMed
  3. Psychooncology. 2003 Sep;12(6):557-66 - PubMed
  4. J Med Internet Res. 2013 Mar 27;15(3):e65 - PubMed
  5. AMIA Annu Symp Proc. 2005;:859-63 - PubMed
  6. Health Serv Res. 2014 Feb;49(1 Pt 2):325-46 - PubMed
  7. Ann Intern Med. 2010 Jul 20;153(2):121-5 - PubMed
  8. Phys Sportsmed. 2014 Nov;42(4):125-30 - PubMed
  9. Methods Inf Med. 2002;41(4):289-98 - PubMed
  10. Respir Care. 2008 Oct;53(10):1310-5 - PubMed
  11. AMIA Annu Symp Proc. 2003;:674-8 - PubMed
  12. Br Med J (Clin Res Ed). 1986 Mar 1;292(6520):596-8 - PubMed
  13. AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:209-15 - PubMed
  14. Ann Intern Med. 2012 Oct 2;157(7):461-70 - PubMed
  15. J Med Internet Res. 2013 Aug 26;15(8):e168 - PubMed
  16. BMJ Qual Saf. 2016 May 6;:null - PubMed
  17. Biomed Res Int. 2014;2014:240403 - PubMed
  18. J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36 - PubMed
  19. J Med Internet Res. 2015 Jun 23;17(6):e148 - PubMed
  20. Med Care. 2013 Mar;51(3 Suppl 1):S52-6 - PubMed
  21. CA Cancer J Clin. 1998 May-Jun;48(3):151-62 - PubMed
  22. J Bioinform Comput Biol. 2015 Jun;13(3):1541001 - PubMed
  23. Proc AMIA Symp. 1999;:107-11 - PubMed
  24. Stud Health Technol Inform. 2013;192:714-8 - PubMed
  25. J Health Commun. 2010;15 Suppl 2:183-96 - PubMed
  26. J Biomed Inform. 2015 Oct;57:333-49 - PubMed
  27. Stud Health Technol Inform. 2001;84(Pt 1):399-403 - PubMed
  28. AMIA Annu Symp Proc. 2007 Oct 11;:399-403 - PubMed
  29. N Engl J Med. 2009 Mar 12;360(11):1057-60 - PubMed
  30. Am J Kidney Dis. 2015 Jun;65(6):842-50 - PubMed
  31. J Am Med Inform Assoc. 2006 Jan-Feb;13(1):24-9 - PubMed
  32. J Med Internet Res. 2015 Dec 03;17(12):e275 - PubMed
  33. Am J Emerg Med. 2000 Nov;18(7):764-6 - PubMed
  34. AMIA Annu Symp Proc. 2007 Oct 11;:846-50 - PubMed
  35. AMIA Annu Symp Proc. 2010 Nov 13;2010:366-70 - PubMed
  36. Health Bull (Edinb). 1992 Mar;50(2):143-50 - PubMed
  37. Br J Gen Pract. 2004 Jan;54(498):38-43 - PubMed
  38. Stud Health Technol Inform. 2007;129(Pt 2):1117-21 - PubMed
  39. J Med Internet Res. 2001 Jul-Sep;3(3):E24 - PubMed
  40. J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13 - PubMed
  41. J Am Med Inform Assoc. 2008 Jul-Aug;15(4):496-505 - PubMed

Publication Types

Grant support