/Real-Time-Data-Analysis-of-Twitter-API-Using-Apache-Spark-Streaming

Twitter is the best source for real-time data, it will provide a large amount of data that is publicly available Twitter API. We have used Kafka streaming to fetch data from Twitter API to Spark and PySpark to perform analytics and transfer data from SparkStreaming to the Hive database. And visualize the analysis of data using Tableau.

Primary LanguagePython

Now-a-days usage of social media platforms is increasing widely, which is making a way for evolution of many big data technologies and frameworks to store and process the data. Along with storing the data, processing data effectively is most important to deal the data with ease. There are many types of processing the data like batch processing, stream processing, real time processing, hybrid processing etc. Also, there are many streaming platforms and frameworks are available in present market. Spark streaming is one of the best streaming processes to deal with real time data which we are going to see in our project. To build a project on Spark streaming we need a continuous real time data. Twitter is the best source for real time data, it will provide large amount of data which is publicly available twitter API. We have used kafka streaming to fetch data from twitter API to Spark and PySpark to perform analytics and transfer data from SparkStreaming to Hive database.