claireforan/CyberSentimentAnalysis

Python

Detection of Textual Cyberbullying

Abstract

Create a model which will carry out the detection of textual cyber bullying with the highest possible accuracy using machine learning techniques.

Methods

Using bad of words and Td-Idf on our dataset

Then use train and test split, 10-fold cross validation

Our test and train data is passed through to the following classifier:

Linear SVM
Random Forest
Gaussian Naive Bayes
Decision Tree

Results are measured using:

Classification report (f1 score, accuracy, precision, recall)
Confusion Matrix
Kappa Score
T-test

Results are plotted using matplotlib

Languages & Tools Used

Sci-Kit Learn, Python3.6, XML, Numpy, Scipy, Matplotlib

Resources

Dataset : https://www.kaggle.com/swetaagrawal/formspring-data-for-cyberbullying-detection