/Twitter-RealTime-Sentiment-Analysis

This Python Lab project is designed to pull data from Twitter, store, and analyze it. The project will pick real-time tweets from Twitter, perform sentiment analysis of the same, create an email trigger, and visual tweets and sentiments. At the same time, the same tweets are stored in batches and then analyzed.

Primary LanguageHTML

Twitter Sentiment Analysis for Brand Improvement

This project is designed to pull real data from twitter and perform sentiment analysis on the same, also it is expected to store the tweets and do a batch analysis.

This project is designed for brand improvement. It affords a firm/company etc to track mentions and trends on the twitter space, and use the details to get an idea of the general public's view of their company, business and/or firm. It is also designed to

Sentiment analysis will be performed on realtime tweets, the tweet and results visualised. At the same time, the same tweets are stored in batches and then analysed.

My Approach.

The docker-compose.yml running kafka and kaffrop is hosted on digitalocean as a docker droplet, along side a flask application for initialising the streams and viewing the batch analytics.
  • kafdrop is running on <docker-droplet-host>:9000

  • flask app is running on <docker-droplet-host>:5000

  • kafka is running on <docker-droplet-host>:9020

  • PostgreSQL is a service on Azure

  • The consumer script consumefromkafka.py is hosted separately for now.

They are all available online.

Cost:
  • No cost was incurred as i used a combination of my student access to Azure, and Github student pack to freely use some paid services.

Project Structure


├───src
├     └─── appy.py
├     └─── api.py
├     └─── apiConfig.py
├     └─── consumefromkafka.py
├     └─── templates
├     └─── static
├  
├───containerisation
├                 └─── docker-compose.yml
├───database_script
├    └─── create_table.sql
├
├───gitignore
├───Procfile
├───README.md
├───requirements.txt
└───runtime

Tools, Technology and Architecture

architecture

1. Tweepy

Tweepy is the official python api for twitter. It allows for streams and batch harvesting of tweets. A twitter developer account is needed to use this awesome API. To have acces to the necessary keys for twitter api, a user is expected to apply for an account here.

Tweey documentation

2. Apache Kakfa

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Read more about Kafka here

3. Kafdrop

Kafdrop is an awesome tool for visualizing kafka. you can get to create a broker, delete, visualise data stored in kafka etc

4. PostgreSQL

Stores the processed data and allow for other form of batch usage.

5. PowerBI

Displays and visualise the data from apache spark in real time. Also, does same for the data in postgreSQL, but this time as a batch display and visualization.

Usage

  • clone the repository
  • cd to the containerisation directory and run docker-compose up or 'docker-compose up -d` to run in silent mode. This will set up Kafka and Kafdrop.
  • goto localhost:9000 to launch kafdrop web UI. create a broker using the UI
  • create a virtual environment
  • run pip install -r requirements.txt
  • connect to the your postgres database and create the table using the script in `create_table.sql
  • then run python app.py to start the flask application.
  • navigate to localhost:5000 and start streaming and view results from the dashboards.

TODO:

  • change the technology/tools for the consumer to apache spark
  • send realtime email to alert users
  • combine ML model with the result of Textblob
  • add login page to the flask application
  • containerise the whole application, as some services are set up with docker compose while the others are hosted on ec2.
  • ....