Sarcasm-detection-over-Reddit-Corpus

Course Project for DSE 407 - Natural Language Processing

by Dr. Tanmay Basu and Dr. Jasabanta patro

The task was to develop an NLP method that could identify the sarcastic comments perfectly based on the learnings from the labelled dataset. We used bag of words model and TF-idf vectorizers and applied 3 classifiers namely, Multinomial Naive Bayes, Logistic regression and Support vector machine to train the machine to learn labelling. We applied different NLP techniques like stop-word removal, lemmatization and stemming on the dataset to test for the accuracy of prediction. Later, we used the best trained model to predict the class 'sarcastic' or 'non-sarcastic' on the given test dataset.

Khodak, Mikhail and Saunshi, Nikunj and Vodrahalli, Kiran Proceedings of the Linguistic Resource and Evaluation Conference (LREC) (2018) [A Large Self-Annotated Corpus for Sarcasm] (https://doi.org/10.48550/arXiv.1704.05579)