Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/13954
|
Title: | Using Graphs and Semantic Information to Improve Text Classifiers |
Authors: | Das, Nibaran Gosh, Swarnendu Gonçalves, Teresa Quaresma, Paulo |
Editors: | Przeporkowski, Adam Ogrodniczuk, Maciej |
Issue Date: | 2014 |
Publisher: | Springer |
Abstract: | Text classification using semantic information is the latest
trend of research due to its greater potential to accurately represent text
content compared with bag-of-words (BOW) approaches. On the other
hand, representation of semantics through graphs has several advantages
over the traditional representation of feature vector. Therefore, error tol-
erant graph matching techniques can be used for text classification. Nev-
ertheless, very few methodologies exist in the literature which use seman-
tic representation through graphs. In the present work, a methodology
has been proposed to represent semantic information from a summa-
rized text into a graph. The discourse representation structure of a text
is utilized in order to represent its semantic content and, afterwards, it
is transformed into a graph. Five different graph matching techniques
based on Maximum Common Subgraphs (mcs) and Minimum Common
Supergraphs (MCS) are evaluated on 20 classes from the Reuters dataset
taking 10 docs of each class for both training and testing purposes using
the k-NN classifier. From the results it can be observed that the tech-
nique has potential to perform text classification as well as the traditional
BOW approaches. Moreover a majority voting based combination of the
semantic representation and a traditional BOW approach provided an
improved recognition accuracy on the same data set. |
URI: | http://hdl.handle.net/10174/13954 |
Type: | article |
Appears in Collections: | INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|