The database was created from .csv files with PostgreSQL and hosted in a cloud database (AWS RDS). The dashboard was created with Metabase that was installed on a virtual server (AWS RC2) and connected to the cloud database.
Entity-Relationship Diagram with DbSchema.
This is a Natural Language Processing project on Sentiment Analysis using RoBERTa model to predict the emotion (positive, negative or neutral) of tweets regarding the keyword 'Balenciaga'. This term was choosen in order to investigate the crisis that engulfed the designer brand in late November 2022 after backlash to an ad campaign featuring S&M-inspired products alongside children boiled over, resulting in social media outrage, cable news takedowns, vandalism and protests at stores.
The twitter footprint of such a scandal can give us insight on the society reflexes towards controversies as well as business analytics on damage control.
Tweets were scraped with snscrape library, which, unlike Tweepy, doesn't require use of API keys and allows one to scrape historical data. In the first version of the notebook, a total of 1000 tweets were collected from 01.12.2022 to 02.12.2022. The data were stored in an Elasticsearch index. Subsequently, they were quered in order to be cleaned and preprocessed before they were sent for an ETL job, where RoBERTa model was utilized. The resulted processed tweets and sentiment analysis results went through a preliminary Exploratory Data Analysis (EDA) and were stored to another Elasticsearch index.
- Utilize emojis instead of excluding them from the Sentiment Analysis
- Compare RoBERTa with e.g. VADER
- Set up Kibana as dashboard
- Gather cleaning and preprocess functions in a python class
- Repeat project for streaming data using Tweepy, pySpark and Kafka
- Dockerize ł Cloud computing ł Airflow