TwitterAnalytica

Project for the cloud computing course. Analysis of tweets streams through Apache Spark and visualization on interface realized in Streamlit

Authors

The Twitter API continuously listens to the tweets stream.

The stream is sent to Kinesis who manages its saving on S3.

Amazon Glue organizes the flow into structured data using Parquet

S3 saves the data collected in buckets organized by days.

The EMR cluster takes care of running Spark and processing the data.

The interface displays the data and topics sought by the user interacting with the cluster through the Apache Livy Rest API