/nlp_ukraine_war

Primary LanguageJupyter NotebookMIT LicenseMIT

Project logo

Status License

Ukrainian War: A Global Opinion Analysis Using Twitter Data 🌍🐦

Overview πŸ“œ

This project focuses on conducting a comprehensive sentiment analysis on the War in Ukraine, utilizing a vast dataset of tweets published throughout the year 2022. Our aim is to extract, analyze, and interpret the sentiments, opinions, and emerging trends expressed on Twitter regarding the ongoing conflict. This analysis will provide valuable insights into public perception and the global discourse surrounding the conflict.

Contributors πŸ‘₯

  • Claudia Agromayor
  • Malo Langourieux
  • Arthur Fournon
  • Vincent Lefeuve
  • Gauthier Riquier
  • Nicolas Brandel

Dataset πŸ“Š

The primary dataset for this project is the "Ukraine Russian Crisis Twitter Dataset," which comprises over 1.2 million tweets. This extensive collection has been meticulously gathered to represent a wide array of perspectives and voices discussing the conflict. The dataset is publicly available on Kaggle and can be accessed through the following link: Ukraine Russian Crisis Twitter Dataset.

Project Structure πŸ—οΈ

  1. Data:

    • The data directory contains tweets related to the War in Ukraine found on an online database in csv format.
  2. Src:

    • The src directory includes code for running the web-application and all of the code.
  3. Tests:

    • The tests directory houses the code corresponding to the unit and coverage tests.
  4. ML:

    • The ml directory focuses on the code needed to construct the text classification models, including the Shallow learning and Transformer-based approaches.

How to Use πŸ› οΈ

  1. Clone the Repository:
    git clone https://gitlab-cw4.centralesupelec.fr/groupe-7-les-bg/war_ukraine.git
    
    
  2. Install the necessary packages:
    make init
    
  3. Download the model and place it: Click here to download the model. Once you have downloaded it, simply extract it and place the /model folder inside the ml folder.
  4. Download the pre-processed dataset Click here to download the pre-procces datasets. Once you have it, place the /tweets_processed folder inside /data.

If you have Python3 installed:

  1. Run the project:
    make build3
    
  2. Run unit tests:
    make test3
    

If you only have Python installed:

  1. Run the project:
    make build
    
  2. Run unit tests:
    make test
    

Enjoy!

Requirements βœ…

Req β„– Description Importance Current state
1 Pre-process the datasets and extract knowledge πŸ“š Crucial βœ… Done
2 Create data visualisations from the dataset πŸ“Š Crucial βœ… Done
3 Perform sentiment analysis from the dataset πŸ’­ Crucial βœ… Done
4 Create a transformer/shallow learning-based tweet classifier (pro Russian/Ukrainian) 🐦 Important βœ… Done
5 Make a web-application using dash 🌐 Important βœ… Done
6 Create wordclouds ☁️ Important βœ… Done
7 Implement a cloropleth using geographical data and the classification of the tweets πŸ—ΊοΈ Important βœ… Done
8 Provide a way for users to easily run the project (Makefile) πŸƒ Important βœ… Done
9 Add other plots to the web application πŸ“ˆ Medium βœ… Done
10 Add unit and coverage testing πŸ§ͺ Medium 🚧 Partial
11 Provide documentation with docstrings and a sphynx wiki πŸ“ Medium 🚧 Partial
12 Compare other methods of classifiers (rule-based, LSTMs...) πŸ”„ Low ❌ In the future
13 Put the repository in a docker container to run it easily 🐳 Low ❌ In the future
14 Write a project report πŸ“„ Low ❌ In the future
15 Analyse the datasets as time-series ⏳ Very Low 🚧 Partial

Contributing πŸ‘«

If you'd like to contribute to this project, feel free to fork the repository, create a new branch, make your changes, and submit a pull request. Make sure to follow the project's coding standards and guidelines.

Contact πŸ“ͺ

For any questions or concerns, please contact the project maintainers: