offensiveTweetDetection

This project attempts to detect offensive tweets from a labeled dataset (agressive_tweet_classed.txt) of about 15,000 tweets with insulting language. This dataset was converted from the dataset presented in Davidson et al.'s paper and dataset https://data.world/ml-research/automated-hate-speech-detection-data into a binary dataset of offensive or non-offensive tweets. We also present a 60 K twitter dataset (profane_tweets.txt) that contains offensive words. We used it for creating word2vec model as our original dataset size was comparitively smaller.

We also present two ipython notebooks. The first notebook offensiveTweetDetection.ipynb contains classification across traditional machine learning models as well as LSTM and FastText. The second notebook offensiveTweetDetectionCNN.ipynb contains the CNN classification models.

paularindam/offensiveTweetDetection

offensiveTweetDetection