/INF582-NLP_challenge

Construction of a model to determine whether summaries of some news article were written by humans or machine generated.

Primary LanguagePythonMIT LicenseMIT

INF582-NLP_challenge

Paper associated to this git repository : https://www.researchgate.net/publication/361569713_INF582_NLP_Challenge_Summary_Source_Prediction
Readme associated to the git repository : https://github.com/paultheron-X/INF582-NLP_challenge.git
Authors : Jérémie Dentan, Louis Gautier, Paul Théron

Getting started

In order to set up the project, please use a virtual environment that can be created and activated with

python3 -m venv .venv
source ./.venv/bin/activate

Then, install the required libraries with

pip install --upgrade pip
pip3 install -r requirements.txt

Documentation

An academic description of our work is available :

Dowload data and compute features

To reproduce the results of the paper, please run :

bash sh/data_download.sh
bash preprocessing.sh

The preprocessing time is about 1h. However, we provide the computed .csv files in processed_data directory.

Prediction

Simply run:

python main.py

or, for the same algorithm with fine-tuned parameters (same performance yet) :

python main_xgbtuned.py