OpenAIR @ RGU >
Design and Technology >
Computing >
Theses (Computing) >

Please use this identifier to cite or link to this item:
This item has been viewed 30 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Mukras thesis.pdf832.56 kBAdobe PDFView/Open
Title: Representation and learning schemes for sentiment analysis.
Authors: Mukras, Rahman
Supervisors: Wiratunga, Nirmalie
Lothian, Robert
Issue Date: Jan-2009
Publisher: The Robert Gordon University
Citation: MUKRAS, R., WIRATUNGA, N., LOTHIAN, R., CHAKRABORTI, S. and HARPER, D., 2007. Information gain feature selection for ordinal text classification using probability redistribution. In: Proceedings of IJCAI Textlink Workshop.
MUKRAS, R., WIRATUNGA, N. and LOTHIAN, R., 2007. Selecting bi-tags for sentiment analysis of text. In: Proceedings of AI-2007. Cambridge: Springer. pp. 181-194
MUKRAS, R., WIRATUNGA, N. and LOTHIAN, R., 2007. The Robert Gordon University at the opinion retrieval task of the 2007 Trec blog track. In: Proceedings of TREC 2007.
CHAKRABORTI, S., MUKRAS, R., LOTHIAN, R., WIRATUNGA, N., WATT, S. and HARPER, D., 2007. Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of IJCAI. AAAI Press. pp. 1582-1587.
ORECCHIONI, A., WIRATUNGA, N., MASSIE, S., CHAKRABORTI, S. and MUKRAS, R., 2007. Learning incident causes. In: Proceedings of ICCBR TCBR Workshop, 2007.
Abstract: This thesis identifies four novel techniques of improving the performance of sentiment analysis of text systems. Thes include feature extraction and selection, enrichment of the document representation and exploitation of the ordinal structure of rating classes. The techniques were evaluated on four sentiment-rich corpora, using two well-known classifiers: Support Vector Machines and Na¨ıve Bayes. This thesis proposes the Part-of-Speech Pattern Selector (PPS), which is a novel technique for automatically selecting Part-of-Speech (PoS) patterns. The PPS selects its patterns from a background dataset by use of a number of measures including Document Frequency, Information Gain, and the Chi-Squared Score. Extensive empirical results show that these patterns perform just as well as the manually selected ones. This has important implications in terms of both the cost and the time spent in manual pattern construction. The position of a phrase within a document is shown to have an influence on its sentiment orientation, and that document classification performance can be improved by weighting phrases in this regard. It is, however, also shown to be necessary to sample the distribution of sentiment rich phrases within documents of a given domain prior to adopting a phrase weighting criteria. A key factor in choosing a classifier for an Ordinal Sentiment Classification (OSC) problem is its ability to address ordinal inter-class similarities. Two types of classifiers are investigated: Those that can inherently solve multi-class problems, and those that decompose a multi-class problem into a sequence of binary problems. Empirical results showed the former to be more effective with regard to both mean squared error and classification time performances. Important features in an OSC problem are shown to distribute themselves across similar classes. Most feature selection techniques are ignorant of inter-class similarities and hence easily overlook such features. The Ordinal Smoothing Procedure (OSP), which augments inter-class similarities into the feature selection process, is introduced in this thesis. Empirical results show the OSP to have a positive effect on mean squared error performance.
Appears in Collections:Theses (Computing)

All items in OpenAIR are protected by copyright, with all rights reserved.


   Disclaimer | Freedom of Information | Privacy Statement |Copyright ©2012 Robert Gordon University, Garthdee House, Garthdee Road, Aberdeen, AB10 7QB, Scotland, UK: a Scottish charity, registration No. SC013781