Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/1434

Title: Text classification using tree kernels and linguistic information
Authors: Gonçalves, Teresa
Quaresma, Paulo
Keywords: Text classification
Support vector machines
Linguistic Information
Issue Date: Dec-2008
Publisher: IEEE Computer Society
Abstract: Standard Machine Learning approaches to text classification use the bag-of-words representation of documents to deceive the classification target function. Typical linguistic structures such as morphology, syntax and semantic are completely ignored in the learning process. This paper examines the role of these structures on the classifier construction applying the study to the Portuguese language. Classifiers are built using the SVM algorithm on a newspaper's articles dataset. The results show that syntactic structure is not useful for text classification (as initially expected), but a novel structured representation that uses document's semantic information has the same discriminative power over classes as the traditional bag-of-words one.
URI: http://hdl.handle.net/10174/1434
Type: article
Appears in Collections:INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
goncalves_tc_treek.pdfdocumento principal204.98 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois