Over the last 40 years, automatic solutions to analyze text documents collection have been one of the most attractive challenges in the field of information retrieval. More recently, the focus has moved towards dynamic, distributed environments, where documents are continuously created by the users of a virtual community, i.e., the social network. In the case of Twitter, such documents, called tweets, are usually related to events which involve many people in different parts of the world. In this work we present a system for real-time Twitter data analysis which allows to follow a generic event from the user's point of view. The topic detection algorithm we propose is an improved version of the Soft Frequent Pattern Mining algorithm, designed to deal with dynamic environments. In particular, in order to obtain prompt results, the whole Twitter stream is split in dynamic windows whose size depends both on the volume of tweets and time. Moreover, the set of terms we use to query Twitter is progressively refined to include new relevant keywords which point out the emergence of new subtopics or new trends in the main topic. Tests have been performed to evaluate the performance of the framework and experimental results show the effectiveness of our solution.
|Titolo della pubblicazione ospite||2015 IEEE International Conference on Communications (ICC)|
|Numero di pagine||6|
|Stato di pubblicazione||Published - 2015|
|Nome||IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS|