Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/1410

Title: Using linguistic information to classify Portuguese text documents
Authors: Teresa, Gonçalves
Paulo, Quaresma
Keywords: Text classification
Support vector machines
Linguistic Information
Issue Date: Oct-2008
Publisher: IEEE Computer Society
Abstract: This paper examines the role of various linguistic structures on text classification applying the study to the Portuguese language. Besides using a bag-of-words representation where we evaluate different measures and use linguistic knowledge for term selection, we do several experiments using syntactic information representing documents as strings of words and strings of syntactic parse trees. To build the classifier we use the Support Vector Machine (SVM) algorithm which is known to produce good results on text classification tasks and apply the study to a dataset of articles from the Público newspaper. The results show that sentences' syntactic structure is not useful for text classification (as initially expected), but part-of-speech information can be used as a term selection technique to construct the bag-of-words representation of documents.
URI: http://hdl.handle.net/10174/1410
ISBN: 978-0-7695-3441-1
Type: article
Appears in Collections:INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
goncalves-classifyPortuguesedocs.pdfdocumento principal245.68 kBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois