J Am Med Inform Assoc. 2021 Nov 25;28(12):2661-2669. doi: 10.1093/jamia/ocab207.
Transferability of neural network clinical deidentification systems.
Journal of the American Medical Informatics Association : JAMIA
Kahyun Lee, Nicholas J Dobbins, Bridget McInnes, Meliha Yetisgen, Özlem Uzuner
Affiliations
Affiliations
- Department of Information Science and Technology, George Mason University, Fairfax, Virginia, USA.
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA.
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA.
PMID: 34586386
PMCID: PMC8633667 DOI: 10.1093/jamia/ocab207
Abstract
OBJECTIVE: Neural network deidentification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start deidentification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical deidentification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer.
MATERIALS AND METHODS: We conducted a comparative study of the transferability of NeuroNER using 4 clinical note corpora with multiple note types from 2 institutions. We modified NeuroNER architecturally to integrate 2 types of domain generalization approaches. We evaluated each architecture using 3 training strategies. We measured transferability from external sources; transferability across note types; the contribution of external source data when in-domain training data are available; and transferability across institutions.
RESULTS AND CONCLUSIONS: Transferability from a single external source gave inconsistent results. Using additional external sources consistently yielded an F1-score of approximately 80%. Fine-tuning emerged as a dominant transfer strategy, with or without domain generalization. We also found that external sources were useful even in cases where in-domain training data were available. Transferability across institutions differed by note type and annotation label but resulted in improved performance.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: [email protected].
Keywords: deidentification; domain generalization; generalizability; transferability
References
- J Biomed Inform. 2015 Dec;58 Suppl:S11-S19 - PubMed
- J Biomed Inform. 2017 Nov;75S:S34-S42 - PubMed
- Appl Clin Inform. 2017 May 31;8(2):560-580 - PubMed
- J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63 - PubMed
- J Biomed Inform. 2015 Dec;58 Suppl:S60-S66 - PubMed
- J Am Med Inform Assoc. 2007 Sep-Oct;14(5):564-73 - PubMed
- J Biomed Inform. 2017 Nov;75S:S4-S18 - PubMed
- Methods Inf Med. 2006;45(3):246-52 - PubMed
- J Am Med Inform Assoc. 2017 May 1;24(3):596-606 - PubMed
- Proc AMIA Annu Fall Symp. 1996;:333-7 - PubMed
- AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079 - PubMed
- J Biomed Inform. 2017 Nov;75S:S19-S27 - PubMed
- J Biomed Inform. 2015 Dec;58 Suppl:S6-S10 - PubMed
- BMC Med Res Methodol. 2010 Aug 02;10:70 - PubMed
- BMC Med Inform Decis Mak. 2017 Dec 01;17(1):155 - PubMed
- Artif Intell Med. 2008 Jan;42(1):13-35 - PubMed
Publication Types
Grant support