- Fridtjof Storm Flaate | fridtjof.flaate@epfl.ch
#169479
The goal of this project was to build a model that could accurately classify tweets as either positive or negative. In this project, you will find six different models. Three classic machine learning models and three neural networks. The best performing model is the neural network using the pre-trained bidirectional encoder representations from transformers, also called BERT. The transfer-learning model gave us an accuracy of 89.3% and an F1 score of 89.6%.
This is a step by step guide of how you can setup up your environment to run the run.py that will create the submission file.
- conda
- pip3
- python3
- Download 'epfml-text' from here, unzip and add to
/twitter-datasets
folder.
- Clone the repo and enter directory
text_classification
git clone https://github.com/StormFlaate/text_classification
- create environment
conda create --name text_classification
- activate environment
conda activate text_classification
- install dependencies
conda install --file requirements.txt && conda install -c huggingface transformers
- install dependencies
pip3 install -r requirements_pip.txt
- requirements.txt: file to install conda dependencies
- requirements_pip.txt: file to install python3 dependecies
- README.md: file containing information about the project
- SGD log loss: trianing and testing of model
- Logisitic regression: training and testing of model
- Random forest: training and testing of model
- NN GloVe: Neural network with aggregated GloVe word embeddings
- NN sentence transformer: NN with all-MiniLM-L6-v2 sentence embedding
- NN transfer learning BERT: Transfer learning model BERT
- run.py: file containing everything to recreate best submission
- helper functions: contains all helper functions and classes used in the project
- twitter-datasets: will contain all data-sets used for this project - need to be downlaoded manually.