Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/29061
|
Title: | Intrinsic and Extrinsic Evaluation of the Quality of Biomedical Embeddings in Different Languages |
Authors: | Franceschini, Paula Santos, Henrique Vieira, Renata |
Keywords: | Language Models Health Informatics |
Issue Date: | Jul-2020 |
Publisher: | IEEE |
Citation: | P. M. Franceschini, H. D. P. dos Santos and R. Vieira, "Intrinsic and Extrinsic Evaluation of the Quality of Biomedical Embeddings in Different Languages," 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 2020, pp. 271-276, doi: 10.1109/CBMS49503.2020.00058. |
Abstract: | Lately, language models have been applied to several tasks in biomedical natural language processing. Some public language models are available online, each built with different corpora. In this paper, we evaluate different public word embedding models trained with both general and biomedical corpora for English and Portuguese. We present intrinsic evaluations based on semantic analogies that use word pairs extracted from the MeSH biomedical thesaurus and also from benchmarks that are available for general-domain evaluation. For extrinsic evaluations we rely on a classification task over Eletronic Health Records. Our experiments show that biomedical embeddings can better capture semantics for biomedical analogies in both languages. On the other hand for extrinsic evaluation, based on classification tasks using the language models, larger general textual corpora appeared equally or more effective. |
URI: | https://doi.org/10.1109/CBMS49503.2020.00058 https://ieeexplore.ieee.org/document/9182968 http://hdl.handle.net/10174/29061 |
Type: | article |
Appears in Collections: | CIDEHUS - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|