This project is developed as part of Principles of big data class at UMKC, Spring 2016.
Twitter data analysis on “USA Presidential Candidates”. 1GB of tweets collected using twitter4j. Dynamic web application to visualize the results. Apache Spark SQL and RDD 9 different dynamic queries are created for top trends like tweet source, highest tweets per candidate, top locations, etc. and sentiment discovery
Environment: Windows 10 Tools: Eclipse, Apache Spark Python, Java, Java Script, D3.js, HTML/CSS Bluemix for hosting Application
Analytical Queries:
Query 1: Tweet Count based on President Candidates
Query 2: Top 8 Most Frequently Tweeting Users
Query 3: Top 8 Users with highest followers
Query 4: Top Locations with most Tweets
Query 5: Users with Friends greater than 150000
Query 6: Top 8 Most Tweeting Timestamps
Query 7: Sentiment Discovery
Query 8: Tweets from Different Type of Devices
Query 9: Tweet vs Retweet Status