R + ggplot2 code for a very quick data visualization for Reddit Mean Submission Score by Subreddit
The data was retrieved from BigQuery via this query:
#standardSQL
WITH
top_subreddits AS (
SELECT
subreddit
FROM
`fh-bigquery.reddit_posts.*`
WHERE
_TABLE_SUFFIX BETWEEN '2019_01' AND '2019_08'
AND score >= 5
AND LENGTH(title) >= 8
GROUP BY
subreddit
HAVING
COUNT(*) >= 2000
ORDER BY
APPROX_COUNT_DISTINCT(author) DESC
LIMIT
50 )
SELECT
subreddit,
LENGTH(title) as title_length,
AVG(score) as avg_score
FROM
`fh-bigquery.reddit_posts.*`
WHERE
_TABLE_SUFFIX BETWEEN '2017_01' AND '2019_08'
AND LENGTH(title) <= 300
AND subreddit IN (select subreddit from top_subreddits)
GROUP BY subreddit, title_length
ORDER BY subreddit, title_length
Max Woolf (@minimaxir)
Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.
MIT