Toxic Overflow: Detecting Hateful Comments

Contributers: Nur Bengisu Cam, Furkan Caglayan, Ahmet Burak Kahraman

You can view the paper and the presentation of the project

1. Introduction

In this project we tried to see if we can identify toxic comments and insult by using Machine Learning algorithms. We implemented Naive Bayes, Decision Tree, SVC and AdaBoost algorithms. We also examined the effects of word2vec based text augmentation.

2. How to Run?

Just make sure everything on the requirements.txt is installed. Then you can run main.ipynb. You can try out different classifiers by extending scripts/classification/_Classifier. Just make sure fit() and predict() functions are correctly implemented.

3. Blogs

We documented our progress under the publication of bbm406f19 on medium. You can read them to see our thinking processes. Don't forget to give claps 😇

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

4.Dataset

We decided to use Wikipedia comments that was used in Toxic Comment Classification Challenge. Dataset has the following labels:

  • toxic

  • severe_toxic

  • obscene

  • threat

  • insult

  • identity_hate

5.License

This work is licensed under MIT license.