
Download reddit submissions as csv

Primary LanguagePythonMIT LicenseMIT

Reddit Data

Download submissions from selected subreddits. The data is exported as .csv file, all times in UTC:

Column Description Type
submission The id of the submission string
subreddit The subreddit name string
author The redditors username string
created Time the submission was created number
retrieved Time the submission was retrieved number
edited Time the submission was modified number
pinned Whether or not the submission is pinned number
archived Whether or not the submission is archived number
locked Whether or not the submission is locked number
removed Whether or not the submission is mod removed number
deleted Whether or not the submission is user deleted number
is_self Whether or not the submission is a text number
is_video Whether or not the submission is a video number
is_original_content Whether or not the submission has been set as original content number
title The title of the submission string
link_flair_text The submission link flairs text content string
upvote_ratio The percentage of upvotes from all votes on the submission number
score The number of upvotes for the submission number
gilded The number of gilded awards on the submission number
total_awards_received The number of awards on the submission number
num_comments The number of comments on the submission number
num_crossposts The number of crossposts on the submission number
selftext The submission selftext on text posts string
thumbnail The submission thumbnail on image posts string
shortlink The submission short url string


Install python3 and pip3, you will also need git.


sudo apt install libsnappy-dev
pip3 install -r requirements.txt


git clone https://github.com/leukipp/reddit-data
cd reddit-data


Create file .streamlit/secrets.toml and set environment variables:

# application

# reddit api

# kaggle api (optional)

Kaggle is only required if you want to upload the dataset on a regular basis. In that case, you will need to create a config/kaggle.json file, similar to the dataset-metadata.json file.


Adapt the start time (unix timestamp) in config/loader.json and run:

python3 data.py <subreddit1> <subreddit2> <subreddit3> ...


Feel free to download some of the existing datasets available on Kaggle as well.
