Detection of Textual Cyberbullying

Abstract

Create a model which will carry out the detection of textual cyber bullying with the highest possible accuracy using machine learning techniques.

Methods

Using bad of words and Td-Idf on our dataset

Then use train and test split, 10-fold cross validation

Our test and train data is passed through to the following classifier:

  • Linear SVM
  • Random Forest
  • Gaussian Naive Bayes
  • Decision Tree

Results are measured using:

  • Classification report (f1 score, accuracy, precision, recall)
  • Confusion Matrix
  • Kappa Score
  • T-test

Results are plotted using matplotlib

Languages & Tools Used

Sci-Kit Learn, Python3.6, XML, Numpy, Scipy, Matplotlib

Resources

Dataset : https://www.kaggle.com/swetaagrawal/formspring-data-for-cyberbullying-detection