This project focuses on conducting a comprehensive sentiment analysis on the War in Ukraine, utilizing a vast dataset of tweets published throughout the year 2022. Our aim is to extract, analyze, and interpret the sentiments, opinions, and emerging trends expressed on Twitter regarding the ongoing conflict. This analysis will provide valuable insights into public perception and the global discourse surrounding the conflict.
- Claudia Agromayor
- Malo Langourieux
- Arthur Fournon
- Vincent Lefeuve
- Gauthier Riquier
- Nicolas Brandel
The primary dataset for this project is the "Ukraine Russian Crisis Twitter Dataset," which comprises over 1.2 million tweets. This extensive collection has been meticulously gathered to represent a wide array of perspectives and voices discussing the conflict. The dataset is publicly available on Kaggle and can be accessed through the following link: Ukraine Russian Crisis Twitter Dataset.
-
Data:
- The
data
directory contains tweets related to the War in Ukraine found on an online database in csv format.
- The
-
Src:
- The
src
directory includes code for running the web-application and all of the code.
- The
-
Tests:
- The
tests
directory houses the code corresponding to the unit and coverage tests.
- The
-
ML:
- The
ml
directory focuses on the code needed to construct the text classification models, including the Shallow learning and Transformer-based approaches.
- The
- Clone the Repository:
git clone https://gitlab-cw4.centralesupelec.fr/groupe-7-les-bg/war_ukraine.git
- Install the necessary packages:
make init
- Download the model and place it: Click here to download the model. Once you have downloaded it, simply extract it and place the /model folder inside the ml folder.
- Download the pre-processed dataset Click here to download the pre-procces datasets. Once you have it, place the /tweets_processed folder inside /data.
If you have Python3 installed:
- Run the project:
make build3
- Run unit tests:
make test3
If you only have Python installed:
- Run the project:
make build
- Run unit tests:
make test
Enjoy!
Req β | Description | Importance | Current state |
---|---|---|---|
1 | Pre-process the datasets and extract knowledge π | Crucial | β Done |
2 | Create data visualisations from the dataset π | Crucial | β Done |
3 | Perform sentiment analysis from the dataset π | Crucial | β Done |
4 | Create a transformer/shallow learning-based tweet classifier (pro Russian/Ukrainian) π¦ | Important | β Done |
5 | Make a web-application using dash π | Important | β Done |
6 | Create wordclouds βοΈ | Important | β Done |
7 | Implement a cloropleth using geographical data and the classification of the tweets πΊοΈ | Important | β Done |
8 | Provide a way for users to easily run the project (Makefile) π | Important | β Done |
9 | Add other plots to the web application π | Medium | β Done |
10 | Add unit and coverage testing π§ͺ | Medium | π§ Partial |
11 | Provide documentation with docstrings and a sphynx wiki π | Medium | π§ Partial |
12 | Compare other methods of classifiers (rule-based, LSTMs...) π | Low | β In the future |
13 | Put the repository in a docker container to run it easily π³ | Low | β In the future |
14 | Write a project report π | Low | β In the future |
15 | Analyse the datasets as time-series β³ | Very Low | π§ Partial |
If you'd like to contribute to this project, feel free to fork the repository, create a new branch, make your changes, and submit a pull request. Make sure to follow the project's coding standards and guidelines.
For any questions or concerns, please contact the project maintainers:
- Claudia Agromayor: [claudia.agromayor@student-cs.fr]
- Malo Langourieux: [malo.langourieux@student-cs.fr]
- Arthur Fournon: [arthur.fournon@student-cs.fr]
- Vincent Lefeuve: [vincent.lefeuve@student-cs.fr]
- Gauthier Riquier: [gauthier.riquier@student-cs.fr]
- Nicolas Brandel: [nicolas.brandel@student-cs.fr]