A project encorporating big data technologies in order to stream and analyze twitter content related to movies.
What software you need to run this application
- Python 3
- Flask - Web Framework
- MongoDB - Database
- Apache Spark - Data Processing
- Kafka - Message Broker
-
When running the app for the first time:
- Execute Movie Data Batch notebook once.
- Start zookeeper server.
- Start kafka server.
- Create topics "tweets" and "tweets_analysis"
-
If not running the app for the first time there:
- Start zookeeper server.
- Start kafka server.
- Execute Twitter notebook (gets tweets and puts them in tweets topic).
- Execute Filter Tweets notebook (gets tweets from tweets topic, filters and counts and puts the results in tweets_analysis topic).
- run app.py to start the website.
- go to http://localhost:5000 -> dashboard to view the live stream.
-
Retrieve all movies currently playing, this is a batch operation and can be run once a day
-
File: Movie Data Batch.ipynb
-
Get Tweets with hastags related to the movies
-
File: Twitter.ipynb
-
Process and analyze the tweets
-
File: Filter Tweets.ipynb
(Optional) 4. Visualize results with notebook 4. File: Visualize.ipynb
- Get Results from kafka topic and display updates
- Website folder (flask app), app.py contains all logic