This project can help investigate the mood of people posting tweets on twitter. Project is mainly based on three components:
-
Spark Streaming
-
Twitter4j
-
Stanford NLP
-
JQVMap
Spark Streaming is an interesting extension to Spark that adds support for continuous stream processing to Spark. Spark Streaming inherit the data processing ability of Spark, and also support stream processing.
Twitter4j is a Java library for twitter API, through this API we can develop our java application to fetch the data of Twitter. The streaming data consists of filtered tweets delivering in real time.
Stanford NLP is a natural language processing toolkit developed by Natural Language Processing Group at Stanford University. We can estimate the mood of each tweet via this toolkit.
JQVMap is a JQuery plugin that renders Vector Maps. In this project, we leverage this jquery plugin to display the real-time result via browser.
Apache Maven 3.3.3
Spark 1.3.1
To compile the project, you should make sure Apache Maven has been installed in your system.
To make front-end program work, you should copy the directory
web
to
/var/www/html
the directory
colorData
store the JSON file "colors.json" as the input of JQVMap.
Run the script compile.sh to compile the project.
bash compile.sh
Before you run the application, we should edit the "run.sh" script. Input your own " " of Twitter Application to the script. And you also need to add the absolute path of file "colors.json" to the script.
spark-submit --class TwitterEconomy target/TwitterMovie-1.0-jar-with-dependencies.jar <consumerKey> <consumerSecret> <accessToken> <accessTokenSecret> <JSON outputpath>
Run the script run.sh.
bash run.sh
This submits the application to spark master.