/Toxic-Comment-Classification

Using Deep Learning Models to classify text

Primary LanguageJupyter Notebook

Toxic comment classification

Repo was done as a test for deep nlp using the toxic comment classification data from kaggle.

Another main motivation was to test out deep NLP models those used were:

NOTE: Check output for results, contains fastai classification and pooled rnn results (both output sigmoid ouput (each class has percentage))

Install

git clone 

pip install -r requirements.txt

Download the toxic comment classification dataset from kaggle

Put in the folder data/toxic_comment

BERT

Make sure to put the fine tuned model inside the model folder within the bert folder

NOTE - for bert training check the notebook out

cd bert
python bert_test.py --text You are dumb # For single predict

or 

python bert_test.py --interactive # For console input

Pooled RNN

python train_attention.py # train

python eval.py # Eval or generate csv output

Models

Model Download Link
BERT Link
Pooled RNN Link

BERT Model Training

Trained 3 times with 2 epochs each

First cycle

file_structure

Second cycle

file_structure

Third cycle

file_structure

Enviroment

  • Ubuntu 18.04
  • Cuda 9
  • Nvidia GTX 1080
  • Cudnn 7.4

Dependencies

  • nltk
  • tensorflow-gpu=1.9
  • keras=2.2.4
  • pytorch=1.1.0
  • fastai
  • torchvision=0.3.0

ToDO

  • Train BERT model and test output
  • Train FASTAi ULMFiT and test output
  • Move from pytorch-bert-pretrained model package to transformers packege(latest)

Acknowledgement

  • BERT and Fast AI code was heavily inspired by this repo check out the implementation here

  • Pooled RNN Keras code was also heavily inspired by the following repo check it out here

  • For EDA the folowing github repo served as a backbone for the project those interested check it out here