Sentiment analysis on streams of twitter data

Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies and Natu...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Μπαλτάς, Αλέξανδρος
Άλλοι συγγραφείς: Τσακαλίδης, Αθανάσιος
Μορφή: Thesis
Γλώσσα:English
Έκδοση: 2017
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/10365
Περιγραφή
Περίληψη:Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies and Natural Language Processing techniques and utilises Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data we introduce some preprocess- ing steps of the input for achieving better results in Sentiment Analysis. The classification algorithms are used for both binary and ternary classification, and we examine the effect of the dataset size as well as the features of the input on the quality of results.