Display options
Share it on

JCO Clin Cancer Inform. 2017;1. doi: 10.1200/CCI.16.00045. Epub 2017 Jun 08.

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.

JCO clinical cancer informatics

Justin R Gregg, Maximilian Lang, Lucy L Wang, Matthew J Resnick, Sandeep K Jain, Jeremy L Warner, Daniel A Barocas

Affiliations

  1. Vanderbilt University Medical Center.
  2. Vanderbilt University School of Medicine, Nashville.

PMID: 29541700 PMCID: PMC5847303 DOI: 10.1200/CCI.16.00045

Abstract

PURPOSE: Risk stratification underlies system-wide efforts to promote the delivery of appropriate prostate cancer care. Although the elements of risk stratum are available in the electronic medical record, manual data collection is resource intensive. Therefore, we investigated the feasibility and accuracy of an automated data extraction method using natural language processing (NLP) to determine prostate cancer risk stratum.

METHODS: Manually collected clinical stage, biopsy Gleason score, and preoperative prostate-specific antigen (PSA) values from our prospective prostatectomy database were used to categorize patients as low, intermediate, or high risk by D'Amico risk classification. NLP algorithms were developed to automate the extraction of the same data points from the electronic medical record, and risk strata were recalculated. The ability of NLP to identify elements sufficient to calculate risk (recall) was calculated, and the accuracy of NLP was compared with that of manually collected data using the weighted Cohen's κ statistic.

RESULTS: Of the 2,352 patients with available data who underwent prostatectomy from 2010 to 2014, NLP identified sufficient elements to calculate risk for 1,833 (recall, 78%). NLP had a 91% raw agreement with manual risk stratification (κ = 0.92; 95% CI, 0.90 to 0.93). The κ statistics for PSA, Gleason score, and clinical stage extraction by NLP were 0.86, 0.91, and 0.89, respectively; 91.9% of extracted PSA values were within ± 1.0 ng/mL of the manually collected PSA levels.

CONCLUSION: NLP can achieve more than 90% accuracy on D'Amico risk stratification of localized prostate cancer, with adequate recall. This figure is comparable to other NLP tasks and illustrates the known trade off between recall and accuracy. Automating the collection of risk characteristics could be used to power real-time decision support tools and scale up quality measurement in cancer care.

Conflict of interest statement

AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated. Relationsh

References

  1. N Engl J Med. 2013 Nov 28;369(22):2076-8 - PubMed
  2. Cancer. 2002 Jul 15;95(2):281-6 - PubMed
  3. J Am Med Inform Assoc. 2016 Apr;23 (e1):e20-7 - PubMed
  4. Biometrics. 1977 Jun;33(2):363-74 - PubMed
  5. Urol Oncol. 2009 Jul-Aug;27(4):427-34 - PubMed
  6. Cancer. 2012 Mar 1;118(5):1260-7 - PubMed
  7. Curr Opin Urol. 2008 May;18(3):297-302 - PubMed
  8. J Oncol Pract. 2015 Mar;11(2):114-6 - PubMed
  9. JAMA. 2014 Oct 15;312(15):1542-51 - PubMed
  10. N Engl J Med. 1996 Sep 26;335(13):966-70 - PubMed
  11. J Natl Compr Canc Netw. 2014 Sep;12(9):1211-9; quiz 1219 - PubMed
  12. Urol Int. 2015;95(4):452-6 - PubMed
  13. J Natl Compr Canc Netw. 2016 Jan;14 (1):19-30 - PubMed
  14. J Clin Oncol. 2004 Sep 15;22(18):3726-32 - PubMed
  15. J Natl Cancer Inst. 2010 Oct 20;102(20):1584-98 - PubMed
  16. J Biomed Inform. 2014 Dec;52:28-35 - PubMed
  17. J Oncol Pract. 2016 Feb;12 (2):157-8; e169-7 - PubMed
  18. J Am Coll Radiol. 2013 Feb;10(2):83-92 - PubMed
  19. J Urol. 2015 Apr;193(4):1159-62 - PubMed
  20. World J Urol. 2014 Feb;32(1):99-103 - PubMed
  21. N Engl J Med. 2015 Mar 5;372(10):897-9 - PubMed
  22. J Biomed Inform. 2013 Dec;46(6):1088-98 - PubMed
  23. JAMA. 1998 Sep 16;280(11):969-74 - PubMed

Publication Types

Grant support