Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/17099
|
Title: | An Approach to the POS Tagging Problem Using Genetic Algorithms |
Authors: | Silva, Ana Paula Silva, Arlindo Pimenta Rodrigues, Irene |
Editors: | Madani, Kurosh Correia, Dourado Antonio Rosa, Agostinho Filipe, Joaquim |
Keywords: | Part-of-speech Tagging Disambiguation Evolutionary Algorithms Natural Language Processing |
Issue Date: | 2015 |
Publisher: | Springer International Publishing |
Citation: | Ana Paula Silva , Arlindo Silva, Irene Rodrigues. An Approach to the POS Tagging Problem Using Genetic Algorithms. Chapter
Computational Intelligence
Volume 577 of the series Studies in Computational Intelligence pp 3-17. Springer, 2015 |
Abstract: | The automatic part-of-speech tagging is the process of automatically assigning to the words of a text a part-of-speech (POS) tag. The words of a language are grouped into grammatical categories that represent the function that they might have in a sentence. These grammatical classes (or categories) are usually called part-of-speech. However, in most languages, there are a large number of words that can be used in different ways, thus having more than one possible part-of-speech. To choose the right tag for a particular word, a POS tagger must consider the surrounding words’ part-of-speeches. The neighboring words could also have more than one possible way to be tagged. This means that, in order to solve the problem, we need a method to disambiguate a word’s possible tags set. In this work, we modeled the part-of-speech tagging problem as a combinatorial optimization problem, which we solve using a genetic algorithm. The search for the best combinatorial solution is guided by a set of disambiguation rules that we first discovered using a classification algorithm, that also includes a genetic algorithm. Using rules to disambiguate the tagging, we were able to generalize the context information present on the training tables adopted by approaches based on probabilistic data. We were also able to incorporate other type of information that helps to identify a word’s grammatical class. The results obtained on two different corpora are amongst the best ones published. |
URI: | http://dx.doi.org/10.1007/978-3-319-11271-8_1 http://hdl.handle.net/10174/17099 |
Type: | bookPart |
Appears in Collections: | INF - Publicações - Capítulos de Livros
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|