Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/4582
|
Title: | Polylingual text classification in the legal domain |
Authors: | Gonçalves, Teresa Quaresma, Paulo |
Keywords: | Polylingual text classification Support vector machines |
Issue Date: | 2011 |
Publisher: | Edizioni Scientifiche Italiane |
Abstract: | With the globalization trend there is a big amount of documents writ- ten in different languages. If these polylingual documents are already organized into existing categories one can deliver a learning model to classify newly arrived polylingual documents. Despite being able to adopt a na ̈ıve approach by considering the problem as multiple independent monolingual text classification problems, this approach fails to use the opportunity offered by polylingual training documents to improve the effectiveness of the classifier. This paper proposes a method to combine different monolingual classifiers in order to get a new classifier as good as the best monolingual one having also the ability to deliver the best performance measures possible (precision, recall and F1). The proposed methodology was applied to a corpus of legal documents – from the EUR-Lex site – and was evaluated. The obtained results were quite good, indicating that combining different monolingual classifiers may be a promising approach to reach the best performance for each category independently of the language. |
URI: | http://hdl.handle.net/10174/4582 |
Type: | article |
Appears in Collections: | INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|