A model to detect depression in tweets. The model was implemented using CNN+LSTM. The trained model was tested on a dataset of COVID-19 related tweets to detect signs of depression.
The dataset consists of labeled data, i.e., an equal number of depression indicative tweets and random tweets (non-depressive). The random tweets dataset was taken from the Kaggle dataset twitter_sentiment. The depression tweets were scraped using TWINT with various depression-related keywords.
The model was trained with Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) using Keras. Several other Machine Learning algorithms such as Logistic Regression, KNN, SVM, and Decision Trees were implemented to compare the accuracies.
Algorithm | Accuracy |
---|---|
Logistic Regression | 49.84% |
K-Nearest Neighbors | 68.96% |
Decision Tree | 86.94% |
Support Vector Machine | 68.17% |
CNN + LSTM | 97% |
The COVID-19 dataset consisted of tweets that were scraped using TWINT with COVID-19 keywords. This dataset can be found here. The trained model was tested on this dataset and the results as given below -
Total number of Tweets: 244198
Number of Depressive Tweets: 133109
Number of Non-Depressive Tweets: 111089
Percentage of Depressive Tweets: 54.51%