TweetChallenge: A Python repository from hugoartigas

This is the README file for the code provided by CHELLY SWANN / ARTIGAS HUGO / BEN BOUAZZA ANASS.

You should put train.csv and evaluation.csv in "code/data/"

----------------------------------------------------------------------------------------------
Preprocessing : 

Make sure Tensorflow is installed on your computer.


In order to reproduce our best submission, please execute "python prepro1bis.py" and ignore all warnings
This may take some time.
This will compute the preprocessed data for both train.csv and evaluation.csv


We include two preprocessing scripts, which correspond to Preproc 1bis and Preproc 2 in our report.

- Preproc 1bis : execute "python prepro1bis.py"

- Preproc 2 : execute "python prepro2.py"
Make sure Pytorch and transformers are installed on your computer
Preproc 2 computes Preproc 1 and BERT features, and had a runtime of more than 10 hours on an 8 cores computer in order to process both train.csv and evaluation.csv
Please only run this script under Windows, since differences in the gestion of parallelization made it so we could not get it to work on UNIX operating systems 


-----------------------------------------------------------------------------------------------

Models :

In order to reproduce our best submission, please execute "python main_xgboost.py" and ignore all warnings
This will take some time, please consider running this script overnight if using your own computer
This will save the predictions in "code/data/predictions.csv"


We include two models : our XGBoost model and our neural network.

- XGBoost (Preproc 1bis needed) : execute "python main_xgboost.py"

- Neural Network (Preproc 2 needed) : execute "python main_net.py bert"
A graph representing the evolution of loss during the epochs of training will show up, you can close it without consequences
You can execute "python main_net.py no-bert" in order to ignore BERT features. This will speed up training
hugoartigas/TweetChallenge