Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/13954

Title: Using Graphs and Semantic Information to Improve Text Classifiers
Authors: Das, Nibaran
Gosh, Swarnendu
Gonçalves, Teresa
Quaresma, Paulo
Editors: Przeporkowski, Adam
Ogrodniczuk, Maciej
Issue Date: 2014
Publisher: Springer
Abstract: Text classification using semantic information is the latest trend of research due to its greater potential to accurately represent text content compared with bag-of-words (BOW) approaches. On the other hand, representation of semantics through graphs has several advantages over the traditional representation of feature vector. Therefore, error tol- erant graph matching techniques can be used for text classification. Nev- ertheless, very few methodologies exist in the literature which use seman- tic representation through graphs. In the present work, a methodology has been proposed to represent semantic information from a summa- rized text into a graph. The discourse representation structure of a text is utilized in order to represent its semantic content and, afterwards, it is transformed into a graph. Five different graph matching techniques based on Maximum Common Subgraphs (mcs) and Minimum Common Supergraphs (MCS) are evaluated on 20 classes from the Reuters dataset taking 10 docs of each class for both training and testing purposes using the k-NN classifier. From the results it can be observed that the tech- nique has potential to perform text classification as well as the traditional BOW approaches. Moreover a majority voting based combination of the semantic representation and a traditional BOW approach provided an improved recognition accuracy on the same data set.
URI: http://hdl.handle.net/10174/13954
Type: article
Appears in Collections:INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica

Files in This Item:

File Description SizeFormat
poltal.pdf511.77 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois