/depression-datasets-nlp

A collection of datasets for depression modelling/ detection from social media data

GNU General Public License v3.0GPL-3.0

Datasets for depression detection using data posted on online platforms

đź“š Data availability

The labels for data availability were inspired by the work of Harrigian et al. (2021), and are explained below:

  • FREE - The dataset is publicly available and hosted online for anyone to access.

  • AUTH - The data can be accessed by contacting the paper's authors.

  • API - The dataset can be reproduced from the details provided in the article using dedicated APIs for different social media platforms with a reasonable degree of effort.

  • DUA - The data is available only after a data usage agreement is signed. Sometimes, authorization from an Institutional Review Board (IRB) may be needed.

  • UNK - The dataset availability is unknown; the authors do not mention if the data is available to the research community.

  • N/AV - The dataset is no longer available or cannot be shared due to ethical considerations.

For the datasets that are publicly available for download or can be accessed through user agreements, we provide the links to the data.

covid-virus Denotes that the dataset contains data collected during the COVID-19 pandemic.

Dataset Platform Language Level Annotation Procedure Label Dataset Size Availability Link
Multitask (Benton et al., 2017) Twitter English USER Self-disclosure Labels for multiple disorders 9.5K users UNK
RSDD (Yates et al., 2017) Reddit English USER Self-disclosure Binary 116K users N/AV
Aldarwish and Ahmad (2017) Twitter, Facebook, LiveJournal English POST Manual annotation Binary, DSM-IV symptoms 6.7K posts API
Reece and Danforth (2017) Instagram English USER CES-D Binary 166 users UNK
Shen et al. (2017) Twitter English USER Self-disclosure Binary 2.8K users FREE https://github.com/sunlightsgy/MDDL
160Users (Jamil et al., 2017) Twitter English USER, POST Self-disclosure Binary 160 users, 8K posts AUTH
SAD corpus (Mowery et al., 2017) Twitter English POST Manual annotation Symptoms, psychological stressors 9.3k posts API
Vedula and Parthasarathy (2017) Twitter English USER Depression-related keywords Binary 150 users API
Hiraga (2017) Japanese blogging websites Japanese USER Self-disclosure Binary 101 users UNK
eRisk2017 (Losada et al., 2017) Reddit English USER Self-disclosure Binary 887 users DUA https://erisk.irlab.org/2017/index.html
Yazdavar et al. (2017) Twitter English USER Self-disclosure Binary 47K users UNK
Yazdavar et al. (2017) Twitter English USER Self-disclosure Binary 47K users UNK
Rojas-Barahona et al. (2018) Koko Platform English POST Manual annotation CBT Concepts 4035 posts AUTH https://github.com/YinpeiDai/NAUM
Pirina and Çöltekin (2018) Reddit English POST Subreddit participation Binary 3.6K posts FREE https://github.com/Inusette/Identifying-depression/tree/master/Data_Collector
Eichstaedt et al. (2018) Facebook English USER Medical records diagnosis Binary 683 users UNK
Seabrook et al. (2018) Twitter, Facebook English USER PHQ-9 Depression severity 78 users N/AV
Ricard et al. (2018) Instagram English USER PHQ-8 Binary 749 users UNK
Shen et al. (2018) Sina Weibo Chinese USER Self-disclosure Binary 1.1K users UNK
TRT (Wolohan et al., 2018) Reddit English USER Self-disclosure Binary 12K users UNK
eRisk2018 (Losada et al., 2018) Reddit English USER Self-disclosure Binary 1.1K users DUA https://erisk.irlab.org/2018/index.html
Loveys et al. (2018) 7 Cups of Tea English USER Self-disclosure Binary 1.9K users UNK
Chen et al. (2018a) Twitter English USER Self-disclosure Labels for multiple disorders 7.9K users API
Chen et al. (2018b) Twitter English USER Self-disclosure Binary 7K users API
RSDD-Time (MacAvaney et al., 2018) Reddit English USER Self-disclosure Labels for multiple disorders 598 users N/AV
Islam et al. (2018) Facebook English POST - Binary 7K posts FREE https://github.com/ranju12345/Depression-Anxiety-Facebook-page-Comments-Text
SMHD (Cohan et al., 2018) Reddit English USER Self-disclosure Labels for multiple disorders 350K users N/AV
Wu et al. (2018) Facebook Chinese USER CES-D Binary 1.4K users UNK
Hemtanon and Kittiphattanabawon (2019) Facebook Thai POST Manual annotation Binary 1.5K posts UNK
Wang et al. (2019) Sina Weibo Chinese POST Manual annotation Depression severity 13.9K users UNK
Gui et al. (2019) Twitter English USER Self-disclosure Binary 2.8K users API
Chandra Guntuku et al. (2019) Twitter English USER BDI Binary 887 users UNK
Almouzini et al. (2019) Twitter English USER, POST Manual annotation Binary 89 users UNK
Leis et al. (2019) Twitter Spanish USER, POST Self-disclosure, manual annotation Binary 540 users, 1K posts FREE https://www.kaggle.com/datasets/francescoronzano/spanish-tweets-suggesting-depression
Coello-Guilarte et al. (2019) Twitter Spanish USER Self-disclosure Binary 316 users FREE https://ccc.inaoep.mx/~mmontesg/resources/CrossLingualDepression.zip
Peng et al. (2019) Sina Weibo Chinese USER Manual annotation Binary 387 users UNK
eRisk2019 (Losada et al., 2019) Reddit English USER BDI-II BDI filled-in 20 users DUA https://erisk.irlab.org/2019/index.html
Uddin et al. (2019) Twitter Bengali POST Manual annotation Binary 3.8K posts UNK
Yao et al. (2020) Sina Weibo Chinese USER Manual, automatic annotation Binary 2.7K users UNK
Owen et al. (2020) Twitter English POST Manual annotation Binary 1K posts FREE https://bitbucket.org/nlpcardiff/preemptive-depression-anxiety-twitter/src/master/
Bathina et al. (2021) Twitter English USER Self-disclosure Binary 1.2K users AUTH https://github.com/mctenthij/CDS_paper
RĂ­ssola et al. (2020) Reddit English POST Self-disclosure, heuristics Binary 14K posts DUA
Birnbaum et al. (2020) Facebook English USER Medical records diagnosis Binary 223 users AUTH
Mann et al. (2020) Instagram Portuguese USER BDI Binary 221 users UNK
Santos et al. (2020) Twitter Portuguese USER Self-disclosure Binary 224 users UNK
Alghamdi et al. (2020) Arabic POST Manual annotation Binary 20K posts UNK
Li et al. (2020) Sina Weibo Chinese USER Self-disclosure Binary 1.8K users FREE https://github.com/omfoggynight/Chinese-Depression-domain-Lexicon
D2S (Yadav et al., 2020) Twitter English POST PHQ-9 PHQ-9 symptoms 12K posts AUTH
Wang et al. (2020) Sina Weibo Chinese USER Depression-related keywords Binary 32K users FREE https://github.com/aidenwang9867/Weibo-User-Depression-Detection-Dataset
eRisk2020 (Losada et al., 2020) Reddit English USER BDI-II BDI filled-in 90 users DUA https://erisk.irlab.org/2020/index.html
Stankevich et al. (2020) VKontakte Russian USER BDI BDI score 1.3K users UNK
covid-virus Tabak and Purver (2020) Twitter English, French, German, Italian, Spanish USER Self-disclosure Binary 5K users API
Yazdavar et al. (2020) Twitter English USER Manual annotation Binary 8.7K users DUA
Wołk et al. (2021) Facebook, Reddit Polish POST Self-disclosure, clinical interview Binary 262 users UNK
Haque et al. (2021) Reddit English POST Subreddit participation Depression vs. suicide 1.8K posts FREE https://github.com/ayaanzhaque/SDCNL
Chiu et al. (2021) Instagram English, Chinese USER Depression-related keywords Binary 520 users UNK
Nanomi Arachchige et al. (2021) Online forums English POST Manual annotation Depression severity 2.1K posts UNK
Hämäläinen et al. (2021) Online blogs Thai POST Manual annotation Binary 900 posts FREE https://zenodo.org/record/4734552
Sherman et al. (2021) Reddit English USER Self-disclosure Binary 31K users DUA
Yang et al. (2021) Sina Weibo Chinese POST Manual annotation Depression severity 6.1K posts AUTH
eRisk 2021 (Parapar et al., 2021) Reddit English USER BDI-II BDI filled-in 170 users DUA https://erisk.irlab.org/2021/index.html
Pirayesh et al. (2021) Twitter English USER Self-disclosure Binary 817 users AUTH
Niimi (2021) TOBYO Japanese USER Blog theme Binary 901 users UNK
Musleh et al. (2021) Twitter Arabic USER, POST CES-D and self-disclosure Binary, DSM-5 symptoms 4.5K posts UNK
Guo et al. (2021) Reddit English USER Self-disclosure Labels for multiple disorders 7.9 K users API
covid-virus Zhang et al. (2021) Twitter English USER Self-disclosure Binary 5K users API
covid-virus Cohrdes et al. (2021) Twitter German POST Automatic annotation for PHQ-8 symptoms Binary 88K posts AUTH
covid-virus Zhou et al. (2021) Twitter English USER Self-disclosure Binary 1.8M posts API
Safa et al. (2022) Twitter English USER Self-disclosure Binary 1.1 K users AUTH
Maghraby and Ali (2022) Twitter Arabic POST PHQ-9 PHQ-9 symptoms 1.2K posts FREE https://data.mendeley.com/datasets/myrb2gky8w/1
Naseem et al. (2022) Reddit English POST Manual annotation Depression severity 3.5 K posts FREE https://github.com/usmaann/Depression_Severity_Dataset
PsySym (Zhang et al., 2022) Reddit English USER, POST Automatic and manual annotation DSM-5 symptoms for multiple disorders 26K users, 8.5K posts AUTH https://github.com/blmoistawinde/EMNLP22-PsySym
MHB (Boinepelli et al., 2022) Mental health forums English USER Forum participation Only depression 9.3K users FREE https://www.dropbox.com/sh/66nousl8j0j5ull/AACwRnzJjszl3Eys8ZjQnMVya?dl=0
CAMS (Garg et al., 2022) Reddit English POST Manual annotation Causes for depression 3.1 K posts FREE https://github.com/drmuskangarg/CAMS
Sotudeh et al. (2022) Reddit English POST Subreddit participation Summarization 24 k posts DUA https://ir.cs.georgetown.edu/resources/mentsum.html
Kayalvizhi and Thenmozhi (2022) Reddit English POST Manual annotation Depression severity 16K posts FREE https://github.com/Kayal-Sampath/detecting-signs-of-depression-from-social-media-postings/tree/main
eRisk2022 (Crestani et al., 2022) Reddit English USER Self-disclosure Binary 3.1K users DUA https://erisk.irlab.org/2022/index.html
Monreale et al. (2022) Reddit English POST Subreddit participation Labels for multiple disorders 16 K posts API
Kabir et al. (2022) Facebook Bengali POST Manual annotation Depression severity 5K posts FREE https://github.com/omanwhatiscomputer/depression-severity/
PRIMATE (Gupta et al., 2022) Reddit English POST Manual annotation PHQ-9 symptoms 2K posts DUA https://github.com/primate-mh/Primate2022
PsycheNet-G (Mihov et al., 2022) Twitter English USER Self-disclosure Binary 591 users UNK
Twitter-STMHD (Singh et al., 2022) Twitter English USER Self-disclosure, manual annotation Labels for multiple disorders 33K users FREE https://zenodo.org/record/5854911
multiRedditDep (Uban et al., 2022) Reddit English USER Self-disclosure Binary 3.7K users AUTH
covid-virus Davis et al. (2022) Reddit English USER Subreddit participation Binary 81K users API
covid-virus Fernández-Barrera et al. (2022) Flickr English POST Depression tags Only depression 14.5K posts UNK
covid-virus Cha et al. (2022) Twitter, Everytime Korean, English, Japanese POST Lexicon-based automatic annotation Binary 26M posts, 22K posts AUTH
DEPTWEET (Kabir et al., 2023) Twitter English POST Manual annotation Depression severity 40K posts FREE https://github.com/mohsinulkabir14/DEPTWEET
SetembroBR (Ramos dos Santos et al., 2023) Twitter Portuguese USER Self-disclosure Binary 18.8K users FREE https://drive.google.com/drive/folders/1MXFRs0u8iF1RNUWABTA0Oz8_Ix1skqZT
Alavijeh et al. (2023) Twitter English USER Self-disclosure Labels for multiple disorders 1.5K users FREE https://github.com/szamani20/Twitter-Mental-Disorder-Dataset
Adarsh et al. (2023) Reddit English POST Subreddit participation Binary 60K posts UNK
Cai et al. (2023) Sina Weibo Chinese USER Self-disclosure and manual annotation Binary 23K users FREE https://github.com/cyc21csri/SWDD
Liu et al. (2023) Reddit English POST Subreddit participation Symptoms 1.3M posts FREE https://github.com/devanshrj/depression-symptoms-reddit
BDI-Sen (PĂ©rez et al., 2023) Reddit English POST Manual annotation BDI-II symptoms 4.9K posts DUA https://erisk.irlab.org/BDISen.html
SMHD-GER (Zanwar et al., 2023) Reddit German POST Manual annotation Labels for multiple disorders 28K posts DUA
Song et al. (2023) Reddit English POST Subreddit participation Labels for multiple disorders 85K posts API
RedditCE (Liang et al., 2023) Reddit English POST Manual annotation Emotion-cause labels 35K posts FREE https://github.com/Liulei-nwpu/N2NCause
Ghosh et al. (2023) Facebook, Twitter, YouTube Bengali POST Manual annotation Binary 15K posts AUTH
Li et al. (2023) Sina Weibo Chinese USER Self-disclosure, manual annotation Binary 4.8K users UNK
Guo et al. (2023) Sina Weibo Chinese USER Manual annotation Binary 3.1K users UNK
Liu et al. (2023) Reddit, Twitter English USER Self-disclosure Binary 205K users, 255 users UNK
RESTORE (Yadav et al., 2023) Reddit, Twitter, Pinterest English POST Manual and automatic annotation PHQ-9 symptoms 9.8K images AUTH
covid-virus Zogan et al. (2023) Twitter English USER Self-disclosure Binary 1.4K users API
covid-virus Wu et al. (2023) Twitter English USER Self-disclosure, manual annotation Binary 10K users DUA https://github.com/dragon-wu/depcov-www2023
DepreSym (PĂ©rez et al., 2023) Reddit English POST Manual annotation BDI-II symptoms 21K posts DUA https://erisk.irlab.org/depresym_dataset.html
Villa-PĂ©rez et al. (2023) Twitter English, Spanish USER Self-disclosure Labels for multiple disorders 6K users DUA https://ieee-dataport.org/documents/twitter-dataset-mental-disorders-detection
HelaDepDet (Priyadarshana et al., 2023) Twitter, Reddit English POST Manual annotation Depression severity 40K posts FREE https://github.com/KUAS-ubicomp-lab/Depression_Severity_Levels_Dataset
MentalRiskES (Mármol Romero et al., 2024)) Telegram Spanish USER Manual annotation BIN + against/in-favour 449 users AUTH https://github.com/sinai-uja/corpusMentalRiskES
Alhamed et al. (2024) Twitter English USER Manual annotation Before/After diagnosis 120 users FREE https://github.com/falwah-alhamed/Depression_Tweets/

For datasets published before 2017, please refer to https://github.com/kharrigian/mental-health-datasets.