Repositório Digital de Publicações Científicas: Polylingual text classification in the legal domain

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/4582

Title:	Polylingual text classification in the legal domain
Authors:	Gonçalves, Teresa Quaresma, Paulo
Keywords:	Polylingual text classification Support vector machines
Issue Date:	2011
Publisher:	Edizioni Scientifiche Italiane
Abstract:	With the globalization trend there is a big amount of documents writ- ten in different languages. If these polylingual documents are already organized into existing categories one can deliver a learning model to classify newly arrived polylingual documents. Despite being able to adopt a na ̈ıve approach by considering the problem as multiple independent monolingual text classification problems, this approach fails to use the opportunity offered by polylingual training documents to improve the effectiveness of the classifier. This paper proposes a method to combine different monolingual classifiers in order to get a new classifier as good as the best monolingual one having also the ability to deliver the best performance measures possible (precision, recall and F1). The proposed methodology was applied to a corpus of legal documents – from the EUR-Lex site – and was evaluated. The obtained results were quite good, indicating that combining different monolingual classifiers may be a promising approach to reach the best performance for each category independently of the language.
URI:	http://hdl.handle.net/10174/4582
Type:	article
Appears in Collections:	INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica

Files in This Item:

File	Description	Size	Format
diritto_tcg_pq.pdf		289.28 kB	Adobe PDF	View/Open