Contributers: Nur Bengisu Cam, Furkan Caglayan, Ahmet Burak Kahraman
You can view the paper and the presentation of the project
In this project we tried to see if we can identify toxic comments and insult by using Machine Learning algorithms. We implemented Naive Bayes, Decision Tree, SVC and AdaBoost algorithms. We also examined the effects of word2vec based text augmentation.
Just make sure everything on the requirements.txt is installed. Then you can run main.ipynb. You can try out different classifiers by extending scripts/classification/_Classifier. Just make sure fit() and predict() functions are correctly implemented.
We documented our progress under the publication of bbm406f19 on medium. You can read them to see our thinking processes. Don't forget to give claps 😇
We decided to use Wikipedia comments that was used in Toxic Comment Classification Challenge. Dataset has the following labels:
-
toxic
-
severe_toxic
-
obscene
-
threat
-
insult
-
identity_hate
This work is licensed under MIT license.