/Detecting-Depression-In-Tweets

A model to detect depression in tweets. The model was implemented using CNN+LSTM and compared the accuracy with traditional machine learning algorithms such as Logistic Regression, KNN, SVM, and Decision Trees. The trained model was tested on a dataset of COVID-19 related tweets to detect signs of depression.

Primary LanguageJupyter Notebook

Detecting-depression-in-tweets

Overview

A model to detect depression in tweets. The model was implemented using CNN+LSTM. The trained model was tested on a dataset of COVID-19 related tweets to detect signs of depression.

Data Gathering

The dataset consists of labeled data, i.e., an equal number of depression indicative tweets and random tweets (non-depressive). The random tweets dataset was taken from the Kaggle dataset twitter_sentiment. The depression tweets were scraped using TWINT with various depression-related keywords.

Training

The model was trained with Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) using Keras. Several other Machine Learning algorithms such as Logistic Regression, KNN, SVM, and Decision Trees were implemented to compare the accuracies.

Results

Algorithm Accuracy
Logistic Regression 49.84%
K-Nearest Neighbors 68.96%
Decision Tree 86.94%
Support Vector Machine 68.17%
CNN + LSTM 97%

Application on COVID-19 Dataset

The COVID-19 dataset consisted of tweets that were scraped using TWINT with COVID-19 keywords. This dataset can be found here. The trained model was tested on this dataset and the results as given below -

Total number of Tweets: 244198
Number of Depressive Tweets: 133109
Number of Non-Depressive Tweets: 111089
Percentage of Depressive Tweets: 54.51%