Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/29161

Title: Embeddings for Named Entity Recognition in Geoscience Portuguese Literature
Authors: Consoli, Bernardo
Santos, Joaquim
Gomes, Diogo
Cordeiro, Fabio
Vieira, Renata
Moreira, Viviane
Keywords: Language models
Named entities
Issue Date: May-2020
Publisher: LREC
Citation: CONSOLI, Bernardo, et al. Embeddings for Named Entity Recognition in Geoscience Portuguese Literature. In: Proceedings of The 12th Language Resources and Evaluation Conference. 2020. p. 4625-4630.
Abstract: This work focuses on Portuguese Named Entity Recognition (NER) in the Geology domain. The only domain-specific dataset in the Portuguese language annotated for Named Entity Recognition is the GeoCorpus. Our approach relies on Bidirecional Long Short-Term Memory - Conditional Random Fields neural networks (BiLSTM-CRF) - a widely used type of network for this area of research - that use vector and tensor embedding representations. We used three types of embedding models (Word Embeddings, Flair Embeddings, and Stacked Embeddings) under two versions (domain-specific and generalized). We originally trained the domain specific Flair Embeddings model with a generalized context in mind, but we fine-tuned with domain-specific Oil and Gas corpora, as there simply was not enough domain corpora to properly train such a model. We evaluated each of these embeddings separately, as well as we stacked with another embedding. Finally, we achieved state-of-the-art results for this domain with one of our embeddings, and we performed an error analysis on the language model that achieved the best results. Furthermore, we investigated the effects of domain-specific versus generalized embeddings.
URI: https://www.aclweb.org/anthology/2020.lrec-1.568/
http://hdl.handle.net/10174/29161
Type: article
Appears in Collections:CIDEHUS - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
2020.lrec-1.568.pdf184.13 kBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois