Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/34684
|
Title: | Generating a European Portuguese BERT Based Model Using Content from Arquivo.pt Archive |
Authors: | Miquelina, Nuno Quaresma, Paulo Nogueira, Vitor |
Editors: | Yin, Hujun Camacho, David Tino, Peter |
Keywords: | BERT Portuguese European Arquivo.pt |
Issue Date: | 21-Nov-2022 |
Publisher: | Springer International Publishing |
Citation: | Miquelina, N., Quaresma, P., Nogueira, V.B. (2022). Generating a European Portuguese BERT Based Model Using Content from Arquivo.pt Archive. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_28 |
Abstract: | Building a language model from free available internet information
takes several steps and challenges. This new model aims to be a BERT-based language model for Portuguese-European, with no specific context. The corpus was built using a web page archive infrastructure provided by Arquivo.pt and restricted to .pt domains. This paper will describe the overall process of building the corpus and training a BERT model. |
URI: | http://hdl.handle.net/10174/34684 |
Type: | bookPart |
Appears in Collections: | INF - Publicações - Capítulos de Livros
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|