The Application has features to extract the continous tweets based on location and analyze the Top trends the of Popular HashTags, Popular Mentions at frequency of every 10 seconds.
- We first create a Client-Server connection on a port locally, and we extract live stream of data from
twitter-stream-api
. - Spark Application is continously listening to this port for new data.
- The streaming data is analyzed using spark RDD, Dataframes and used various spark operations like Map, Reduce, updateStateByKey for analysis on the tweets every 10 seconds.
- Dashboards showing trends are refreshed automatically to reflect the changing trends.
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Official Documentation Apache Spark : https://spark.apache.org/docs/latest/index.html