Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/30457
|
Title: | De-identification of Clinical Notes Using Contextualized Language Models and a Token Classifier |
Authors: | Santos, Joaquim Santos, Henrique Tabalipa, Fabio Vieira, Renata |
Keywords: | Electronic health records Named entity recognition |
Issue Date: | Nov-2021 |
Publisher: | Springer |
Citation: | Santos J., dos Santos H.D.P., Tabalipa F., Vieira R. (2021) De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In: Britto A., Valdivia Delgado K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science, vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_3 |
Abstract: | The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94. |
URI: | http://hdl.handle.net/10174/30457 |
Type: | article |
Appears in Collections: | CIDEHUS - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|