Jupyter Notebook Python Anaconda PyTorch scikit-learn NumPy

Machine Learning for Binary Text Classification

CS-433 Project 2, 2021, EPFL Text classification

Author:

Submission:

#169479

Abstract:

The goal of this project was to build a model that could accurately classify tweets as either positive or negative. In this project, you will find six different models. Three classic machine learning models and three neural networks. The best performing model is the neural network using the pre-trained bidirectional encoder representations from transformers, also called BERT. The transfer-learning model gave us an accuracy of 89.3% and an F1 score of 89.6%.

Setup:

This is a step by step guide of how you can setup up your environment to run the run.py that will create the submission file.

Prerequisites

  • conda
  • pip3
  • python3
  • Download 'epfml-text' from here, unzip and add to /twitter-datasets folder.

Installation

  1. Clone the repo and enter directory text_classification
    git clone https://github.com/StormFlaate/text_classification 
  2. create environment
    conda create --name text_classification
  3. activate environment
    conda activate text_classification
  4. install dependencies
    conda install --file requirements.txt && conda install -c huggingface transformers
    
  5. install dependencies
    pip3 install -r requirements_pip.txt

Overview:

Setup files:

Machine learning models:

Run files:

  • run.py: file containing everything to recreate best submission

Helper functions:

  • helper functions: contains all helper functions and classes used in the project

Folders:

  • twitter-datasets: will contain all data-sets used for this project - need to be downlaoded manually.