/kishan-dat-toxic-comment-challenge

Team Kishan-and-Dat R&D for Toxic Comment Challenge

Primary LanguageJupyter NotebookMIT LicenseMIT

Toxic Comments Classification Challenge

This repository documents Kishan Manani and Dat Nguyen's submission to the Toxic Comments Classification Challenge hosted on Kaggle. A variety of methods and tools were explored, these included: Bi-directional LSTMs with word embeddings using Keras, gradient boosted trees using LigthGBM and XGBoost, Logistic Regression, and LASSO along with standard text processing methods such as TF-IDF. We also used model stacking, also known as blending or ensembling.

Notebook kernels

Here are some of the modelling ideas we explored during the competition.

  1. Exploratory data analysis
  2. Baseline model using logistic regression with TF-IDF features
  3. Gradient boosting
  4. Bi-directional LSTMs with word embeddings
  5. Model ensembling