Toxic Comment Classification

Yet another toxic comment classification

Installation

- Python 3.7 or higher
- GNU Make
- CUDA 10.2 or higher

Clone the repo to your local machine:

git clone https://github.com/halecakir/toxic-comment-classification

Build the python virtual environment:

make venv/bin/activate

Fetch wordvec data from multiple sources (glove, google-news, fasttext):

make fetch_all

Train the model with the jigsaw data:

make train ARGS=WORD_VECTOR  # WORD_VECTOR ∈ {"google.bin", "fasttext.bin", "glove.txt"})

Test the model:

make test

Remove all model artifacts:

make clean

Try Attention mechanism
Try tranformers-based mechanismss
Try incorporation of hybrid (word level + character level) word vectors for words that have no pretrained vectors
Try Gradient clipping for exploding gradient
Add hyperparamerer optimization
Add sanity tests
Documentation!