Repositório Digital de Publicações Científicas: Word Embedding Evaluation in Downstream Tasks and Semantic Analogies


Sign on to:
	Login
	My DSpace authorized users
	Edit Profile
	Receive email updates

Browse
	Communities & Collections
	Issue Date
	Author
	Title
	Subject

Helps
	Regulamento RDPC
	Guia do Utilizador RDPC
	Depósito RDPC
	Faq's RDPC

	Integração CV DeGóis
	Workshop Open Access

	Newsletter Open Access


	About Dspace
	DSpace Software

Repositorio Digital de Publicacoes Cientificas da Universidade de Evora

/ CIDEHUS - Centro Interdisciplinar de História, Culturas e Sociedades / CIDEHUS - Artigos em Livros de Actas/Proceedings /

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/29657

Title:	Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
Authors:	Santos, Joaquim Consoli, Bernardo Vieira, Renata
Keywords:	Language models Evaluation
Issue Date:	May-2020
Publisher:	LREC/ELRA
Citation:	SANTOS, Joaquim; CONSOLI, Bernardo; VIEIRA, Renata. Word Embedding Evaluation in Downstream Tasks and Semantic Analogies. In: Proceedings of The 12th Language Resources and Evaluation Conference. 2020. p. 4828-4834.
Abstract:	Language Models have long been a prolific area of study in the field of Natural Language Processing (NLP). One of the newer kinds of language models, and some of the most used, are Word Embeddings (WE). WE are vector space representations of a vocabulary learned by a non-supervised neural network based on the context in which words appear. WE have been widely used in downstream tasks in many areas of study in NLP. These areas usually use these vector models as a feature in the processing of textual data. This paper presents the evaluation of newly released WE models for the Portuguese language, trained with a corpus composed of 4.9 billion tokens. The first evaluation presented an intrinsic task in which WEs had to correctly build semantic and syntactic relations. The second evaluation presented an extrinsic task in which the WE models were used in two downstream tasks: Named Entity Recognition and Semantic Similarity between Sentences. Our results show that a diverse and comprehensive corpus can often outperform a larger, less textually diverse corpus, and that passing the text in parts to the WE generating algorithm may cause loss of quality.
URI:	https://www.aclweb.org/anthology/2020.lrec-1.594.pdf http://hdl.handle.net/10174/29657
Type:	article
Appears in Collections:	CIDEHUS - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
artigo.pdf		183.58 kB	Adobe PDF	View/Open

Serviços de Ciência e Cooperação - Universidade de Évora