Repositório Digital de Publicações Científicas: De-identification of Clinical Notes Using Contextualized Language Models and a Token Classifier


Sign on to:
	Login
	My DSpace authorized users
	Edit Profile
	Receive email updates

Browse
	Communities & Collections
	Issue Date
	Author
	Title
	Subject

Helps
	Regulamento RDPC
	Guia do Utilizador RDPC
	Depósito RDPC
	Faq's RDPC

	Integração CV DeGóis
	Workshop Open Access

	Newsletter Open Access


	About Dspace
	DSpace Software

Repositorio Digital de Publicacoes Cientificas da Universidade de Evora

/ CIDEHUS - Centro Interdisciplinar de História, Culturas e Sociedades / CIDEHUS - Artigos em Livros de Actas/Proceedings /

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/30457

Title:	De-identification of Clinical Notes Using Contextualized Language Models and a Token Classifier
Authors:	Santos, Joaquim Santos, Henrique Tabalipa, Fabio Vieira, Renata
Keywords:	Electronic health records Named entity recognition
Issue Date:	Nov-2021
Publisher:	Springer
Citation:	Santos J., dos Santos H.D.P., Tabalipa F., Vieira R. (2021) De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In: Britto A., Valdivia Delgado K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science, vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_3
Abstract:	The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94.
URI:	http://hdl.handle.net/10174/30457
Type:	article
Appears in Collections:	CIDEHUS - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
BRACIS___Anony.pdf		216.19 kB	Adobe PDF	View/Open

Serviços de Ciência e Cooperação - Universidade de Évora