OpenAIR @ RGU >
Design and Technology >
Theses (Computing) >
Please use this identifier to cite or link to this item:
|Title: ||Representation and learning schemes for sentiment analysis.|
|Authors: ||Mukras, Rahman|
|Supervisors: ||Wiratunga, Nirmalie|
|Issue Date: ||Jan-2009|
|Publisher: ||The Robert Gordon University|
|Citation: ||MUKRAS, R., WIRATUNGA, N., LOTHIAN, R., CHAKRABORTI, S. and HARPER, D., 2007. Information gain feature selection for ordinal text classification using probability redistribution. In: Proceedings of IJCAI Textlink Workshop.|
MUKRAS, R., WIRATUNGA, N. and LOTHIAN, R., 2007. Selecting bi-tags for sentiment analysis of text. In: Proceedings of AI-2007. Cambridge: Springer. pp. 181-194
MUKRAS, R., WIRATUNGA, N. and LOTHIAN, R., 2007. The Robert Gordon University at the opinion retrieval task of the 2007 Trec blog track. In: Proceedings of TREC 2007.
CHAKRABORTI, S., MUKRAS, R., LOTHIAN, R., WIRATUNGA, N., WATT, S. and HARPER, D., 2007. Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of IJCAI. AAAI Press. pp. 1582-1587.
ORECCHIONI, A., WIRATUNGA, N., MASSIE, S., CHAKRABORTI, S. and MUKRAS, R., 2007. Learning incident causes. In: Proceedings of ICCBR TCBR Workshop, 2007.
|Abstract: ||This thesis identifies four novel techniques of improving the performance of sentiment analysis of
text systems. Thes include feature extraction and selection, enrichment of the document representation
and exploitation of the ordinal structure of rating classes. The techniques were evaluated
on four sentiment-rich corpora, using two well-known classifiers: Support Vector Machines and
This thesis proposes the Part-of-Speech Pattern Selector (PPS), which is a novel technique
for automatically selecting Part-of-Speech (PoS) patterns. The PPS selects its patterns from a
background dataset by use of a number of measures including Document Frequency, Information
Gain, and the Chi-Squared Score. Extensive empirical results show that these patterns perform
just as well as the manually selected ones. This has important implications in terms of both the
cost and the time spent in manual pattern construction.
The position of a phrase within a document is shown to have an influence on its sentiment
orientation, and that document classification performance can be improved by weighting phrases
in this regard. It is, however, also shown to be necessary to sample the distribution of sentiment
rich phrases within documents of a given domain prior to adopting a phrase weighting criteria.
A key factor in choosing a classifier for an Ordinal Sentiment Classification (OSC) problem is
its ability to address ordinal inter-class similarities. Two types of classifiers are investigated: Those
that can inherently solve multi-class problems, and those that decompose a multi-class problem
into a sequence of binary problems. Empirical results showed the former to be more effective with
regard to both mean squared error and classification time performances.
Important features in an OSC problem are shown to distribute themselves across similar
classes. Most feature selection techniques are ignorant of inter-class similarities and hence easily
overlook such features. The Ordinal Smoothing Procedure (OSP), which augments inter-class
similarities into the feature selection process, is introduced in this thesis. Empirical results show
the OSP to have a positive effect on mean squared error performance.|
|Appears in Collections:||Theses (Computing)|
All items in OpenAIR are protected by copyright, with all rights reserved.