Pushshift Google BigQuery Data Streams
Dataset location
pushshift.rt_reddit
Tables
comments
comments Table Schema
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Labels
----------------- ------------------------------------- ------------ ------------- ------------ -------------------------- --------
23 May 01:14:24 |- author: string (required) 0 0 DAY (field: created_utc)
|- author_flair_text: string
|- author_flair_css_class: string
|- body: string (required)
|- created_utc: timestamp
|- subreddit: string (required)
|- subreddit_id: integer (required)
|- parent_id: string
|- link_id: string (required)
|- id: integer (required)
|- send_replies: boolean
|- stickied: boolean
|- subreddit_type: string
|- score: integer
|- gilded: integer
|- can_gild: boolean
|- controversiality: integer
|- is_submitter: boolean
|- no_follow: boolean
|- edited: timestamp
|- permalink: string
|- retrieved_on: timestamp
Standard SQL Query Examples
How many comments have been made in the last minute?
SELECT count(*) FROM `pushshift.rt_reddit.comments` WHERE created_utc > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 1 MINUTE)
What are the most active subreddits over the past five minutes?
SELECT subreddit, count(*) FROM `pushshift.rt_reddit.comments` WHERE created_utc > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 5 MINUTE) GROUP BY 1 ORDER BY 2 DESC