/ml_project

Project for 'DS-GA 1003 Machine Learning'

Primary LanguageJupyter Notebook

Project for 'DS-GA 1003 Machine Learning'

Toxic Comment Classification Challenge

Requirements

Run the following commands to install other packages:
pip3 install -r requirements.txt
pip3 install -U spacy

Data

Download the data from the Toxic Comment Classification Challenge webpage.

Usage

Navigate to the src folder.

Machine Learning models: Run the following commands (back to back):

  • python3 preprocessing.py
  • python3 models.py

fastText models: Run the fasttext notebook.

Deep Learning models: Run the deeplearning notebook. and deeplearning2 notebooks.

Results

All vectorized n-grams, AUC-ROC summary dataframes, predictions and probabilities will be dumped in the pickle_objects/ folder.

Models and ROC curve plots will be dumped in the folders pickle_objects/models/ and plots/ (or pickle_objects/models_features/ and plots_features/ if you choose to use extra features -- see models.py).