/notion-emotion-twitter

An NLP program to extract emotional sentiment from topical tweets (NUS, CS4248 Project)

Primary LanguagePython

Notion of Twitter Emotion

This is a CS4248 Project done by Team 15.

Introduction

Every second, an average of 6,000 tweets are posted on Twitter, with many indicating some form of emotion. With these tweets, we hope to accurately determine the emotions embedded within them using natural language processing (NLP) techniques and be able to generalise the emotional sentiments attached to different topics. Using this model, we aim to design a tool that generates sentiment reports which could prove to be useful in the work of researchers. With that in mind, we conceptualised and implemented our own preprocessing methods and compared between Support Vector Machine, Multinomial Naive Bayes, Random Forest and k-Nearest Neighbours classifiers to build, train and tune our model.

Project Report

Please refer to our project report for the results of our analysis.

Quick Start

To start our model, the following pre-requisites are needed:

Pre-requisites

First, clone our repository by running this command:

git clone https://github.com/grrrrnt/notion-emotion-twitter.git

Second, download the respective libraries by running this command in the root directory:

pip3 install -r requirement.txt

Lastly, with the data file in text_emotion.csv, run our models with this command:

python3 notion_emotion_twitter.py
  • To compare the performance between models, uncomment line 384 of notion_emotion_twitter.py and comment out line 385 instead.
  • To modify the model that is being run, refer to line 234 of notion_emotion_twitter.py and change it accordingly.
  • To select the different features, refer to line 238 of notion_emotion_twitter.py and change it accordingly.
  • To test the RF model on unseen dataset, uncomment line 385 of notion_emotion_twitter.py and comment out line 384 instead.
  • To change the unseen dataset, refer to line 355 of notion_emotion_twitter.py and change it accordingly.

Datasets

The datasets that we have used have been obtained from Kaggle: