An association rule dynamics and classification approach to event detection and tracking in Twitter.
MetadataShow full item record
Twitter is a microblogging application used for sending and retrieving instant on-line messages of not more than 140 characters. There has been a surge in Twitter activities since its launch in 2006 as well as steady increase in event detection research on Twitter data (tweets) in recent years. With 284 million monthly active users Twitter has continued to grow both in size and activity. The network is rapidly changing the way global audience source for information and influence the process of journalism [Newman, 2009]. Twitter is now perceived as an information network in addition to being a social network. This explains why traditional news media follow activities on Twitter to enhance their news reports and news updates. Knowing the significance of the network as an information dissemination platform, news media subscribe to Twitter accounts where they post their news headlines and include the link to their on-line news where the full story may be found. Twitter users in some cases, post breaking news on the network before such news are published by traditional news media. This can be ascribed to Twitter subscribers' nearness to location of events. The use of Twitter as a network for information dissemination as well as for opinion expression by different entities is now common. This has also brought with it the issue of computational challenges of extracting newsworthy contents from Twitter noisy data. Considering the enormous volume of data Twitter generates, users append the hashtag (#) symbol as prefix to keywords in tweets. Hashtag labels describe the content of tweets. The use of hashtags also makes it easy to search for and read tweets of interest. The volume of Twitter streaming data makes it imperative to derive Topic Detection and Tracking methods to extract newsworthy topics from tweets. Since hashtags describe and enhance the readability of tweets, this research is developed to show how the appropriate use of hashtags keywords in tweets can demonstrate temporal evolvements of related topic in real-life and consequently enhance Topic Detection and Tracking on Twitter network. We chose to apply our method on Twitter network because of the restricted number of characters per message and for being a network that allows sharing data publicly. More importantly, our choice was based on the fact that hashtags are an inherent component of Twitter. To this end, the aim of this research is to develop, implement and validate a new approach that extracts newsworthy topics from tweets' hashtags of real-life topics over a specified period using Association Rule Mining. We termed our novel methodology Transaction-based Rule Change Mining (TRCM). TRCM is a system built on top of the Apriori method of Association Rule Mining to extract patterns of Association Rules changes in tweets hashtag keywords at different periods of time and to map the extracted keywords to related real-life topic or scenario. To the best of our knowledge, the adoption of dynamics of Association Rules of hashtag co-occurrences has not been explored as a Topic Detection and Tracking method on Twitter. The application of Apriori to hashtags present in tweets at two consecutive period t and t + 1 produces two association rulesets, which represents rules evolvement in the context of this research. A change in rules is discovered by matching every rule in ruleset at time t with those in ruleset at time t + 1. The changes are grouped under four identified rules namely 'New' rules, 'Unexpected Consequent' and 'Unexpected Conditional' rules, 'Emerging' rules and 'Dead' rules. The four rules represent different levels of topic real-life evolvements. For example, the emerging rule represents very important occurrence such as breaking news, while unexpected rules represents unexpected twist of event in an on-going topic. The new rule represents dissimilarity in rules in rulesets at time t and t+1. Finally, the dead rule represents topic that is no longer present on the Twitter network. TRCM revealed the dynamics of Association Rules present in tweets and demonstrates the linkage between the different types of rule dynamics to targeted real-life topics/events. In this research, we conducted experimental studies on tweets from different domains such as sports and politics to test the performance effectiveness of our method. We validated our method, TRCM with carefully chosen ground truth. The outcome of our research experiments include: Identification of 4 rule dynamics in tweets' hashtags namely: New rules, Emerging rules, Unexpected rules and 'Dead' rules using Association Rule Mining. These rules signify how news and events evolved in real-life scenario. Identification of rule evolvements on Twitter network using Rule Trend Analysis and Rule Trace. Detection and tracking of topic evolvements on Twitter using Transaction-based Rule Change Mining TRCM. Identification of how the peculiar features of each TRCM rules affect their performance effectiveness on real datasets.