/SentimentalBB

Primary LanguageJupyter NotebookMIT LicenseMIT

Sentimental Big Brother


Table of contents

Table of contents generated with markdown-toc

Description

French

En 2021, selon le rapport de l'Economist Intelligence Unit, la France a été classée comme démocratie défaillante par le Democracy Index.

Nos démocraties se numérisent depuis plusieurs années, et une part croissante du débat public se joue dorénavant sur les réseaux sociaux. Alors qu’en période d'élections les débats télévisées sont encadrées par l’ARCOM (ex-CSA), les débats au sein des réseaux sociaux échappent encore à un contrôle clair, et notamment par manque de métriques caractérisant les enjeux qui les traversent.

En tant que citoyens, et en tant qu'étudiants dans l’intelligence artificielle nous ressentons le besoin de mettre au service de notre démocratie des outils permettant de décrypter une partie du débat politique qui se déroule aujourd’hui sur Twitter.

A cet effet, nous étudions aujourd’hui le sentiment de la twittosphère à l’encontre des différents candidats en fonction du temps.

Merci de l’attention que vous portez à notre travail, tout commentaire et toute aide est la bienvenue.

==============================

English

In 2021, according to the Democracy Index published by the Economist Intelligence Unit, France has been ranked as a flawed democracy.

Our democraties have been going digital for several years, and an increasing part of the public debate is now played on social networks. Although during election periods televised debates are supervised by ARCOM (ex-CSA), debates within social networks still escape clear control, notably due to the lack of metrics characterizing the issues that run through them.

As citizens, and as students in artificial intelligence, we feel the need to put at the service of our democracy some tools allowing to decipher part of the political debate that now takes place on Twitter.

To this end, today we are studying the sentiment of the twittosphere against the different candidates as a function of time.

Thank you for your attention to our work, any comments and help are welcome.

Second round

Final

First Round

Candidates order is random.

Pecresse

Pecresse

Zemmour

Zemmour

Dupont-Aignan

Dupont-Aignan

Melenchon

Melenchon

Le Pen

Le Pen

Lassalle

Lassalle

Hidalgo

Hidalgo

Macron

Macron

Jadot

Jadot

Roussel

Roussel

Arthaud

Arthaud

Poutou

Poutou

How to run as a module

poetry run python -m src --argument

How to download the datasets:

AclIMDB

poetry run python -m src data --download aclImdb

Twitter

One can download tweets from twitter, a candidat must be mention:

poetry run python -m src data --download twitter --mention [candidat]

[candidat] must be within ["Pecresse", "Zemmour", "Dupont-Aignan", "Melenchon", "Le Pen", "Lassalle", "Hidalgo", "Macron", "Jadot", "Roussel", "Arthaud", "Poutou"]

You have several more parameters accessible:

  • --text: text you wish to find in the tweet: --text retraite
  • --start_time: date from which you want to start to collect the tweets (need to follow the format: YYYY-mm-DD HH:MM, HH and MM are optional)
  • --end_time: date until which you want to collect the tweets (need to follow the format: YYYY-mm-DD HH:MM, HH and MM are optional)

The dataset collected from twitter are saved into file: data/raw/[candidat]/twitter_{mention}_{start_time}_{end_time}.csv.

How to process raw data with a given model:

The following command applies a model to a given .csv file or recursively to all .csv files in a directory. The path is relative to the data/raw directory.

poetry run python -m src features --model [model_name] --data [path_relative_to_data_raw]

The model name must be within ["random", "naive_bayes", "twitter-xlm-roberta-base-sentiment"] and the default is "twitter-xlm-roberta-base-sentiment".

The output of the model is added in new columns and saved to a .csv file with the same path and name but relative to the data/processed directory.

Contributors