sherlockjjj/T-Watch

Real Time Twitter Sentiment Analysis Product

Jupyter Notebook

Real Time Twitter Stream Analysis via Kafka and Spark Streaming

Motivation:

Build a data product that could process streaming data and has an end-to-end data pipeline that could be easily scaled upon request.

Model Training:

Training tfidf and random forest model using pipeline on spark ML
Saving models to S3

Real Time Analysis:

Collecting real time twitter streams through Kafka
Integrating Kafka with spark streaming
Loading saved model to predict incoming streams in spark streaming
Storing incoming streams to MongoDB in spark streaming
Fetching data from MongoDB and publishing results on web application via flask

Data Flow

Tools:

AWS EC2 EMR S3, SES

Kafka

Spark (spark streaming, spark sql, spark ml)

Flask

MongoDB

Plotly

Twilio