This repository contains the code behind the analysis in the "Characterizing Reddit Participation of Users who Engage in the QAnon Conspiracy Theories" paper (CSCW '22). The data are located here.
File locations
Input data can be found in the inputFiles folder.
Output files can be found in the outputFiles folder.
Repository set up
pip install -r requirements.txt
Input data
Hashed_Q_Submissions_Raw_Combined.csv
Key Stats
2,099,875 unique submissions
from 13,182 unique Q-users 50,002 unique subreddits
from October 28. 2016 to January 23rd, 2021
using the Pushshift API (psaw)
Column Information
Column
Data Type
Definition
subreddit
object
subreddit name where submission is located
id
object
submission id
score
int64
number of upvotes for submission
numReplies
int64
count of comments
author
object
submission author's Reddit username
title
object
submission title
text
object
submission selftext, if no text then link post
is_self
bool
indication of text-only submission
domain
object
url link domain, if no url then submission permalink domain
url
object
full url link for submission, if no url then submission permalink
permalink
object
submission permalink
date_created
datetime64[ns]
submission creation date-time group (UTC)
type
object
submission or comment
Sample
subreddit
id
score
numReplies
author
title
text
is_self
domain
url
permalink
upvote_ratio
date_created
0
greatawakening
8xuv4i
1
14
879f283b831c13474e219e88663d95b0763cca9b
I’ve been writing “Trump Lives Here” on my $20’s after seeing the stamp idea advertised, but seeing this posted again gave me a better idea...going to start writing #QANON or #GreatAwakening on all my bills now!
10,831,922 unique comments
from 11,210 unique Q-users 36,947 unique subreddits
from October 28. 2016 to January 23rd, 2021
using the Pushshift API (psaw)
Column Information
Column
Data Type
Definition
id
object
comment id
link_id
object
submission id the comment is in response to
parent_id
object
parent comment id
author
object
comment author's Reddit username
subreddit
object
subreddit name where comment is located
body
object
body of the comment
date_created
datetime64[ns]
comment creation date-time group (UTC)
Sample
id
link_id
parent_id
author
subreddit
body
date_created
0
e0mztbn
t3_8qy7gp
t3_8qy7gp
182c774799aac38a84f5117fc59cde99b0df19af
greatawakening
My account is new because i lost my password to my last account
2018-06-14 02:17:37
1
e0n0e9q
t3_8qy9wy
t3_8qy9wy
182c774799aac38a84f5117fc59cde99b0df19af
greatawakening
new account only because i lost the password to my old account. you can see my post history at JaM0k3 if you doubt my authenticity and genuine concern
1,571 unique sampled submissions
across 11 combinations of 6 relation labels and 9 topic labels
coding for 7 harmful content labels
Column Information
Column
Data Type
Definition
subreddit
object
sampled subreddit name
title
object
submission title
text
object
submission selftext
date_created
object
submission creation date-time group (UTC)
url
object
full url link for submission, if no url then submission permalink
Reconcile Code1
object
qualitative codes for listed submission (n of 7)
Reconcile Code2
object
qualitative codes for listed submission (n of 7)
Reconcile Code3
object
qualitative codes for listed submission (n of 7)
Reconcile Code4
object
qualitative codes for listed submission (n of 7)
Source
object
if applicable: Source inaccessible, Youtube news commentary, Mischaracterization of fact
Sample
subreddit
title
text
date_created
url
Reconcile Code1
Reconcile Code2
Reconcile Code3
Reconcile Code4
Source
0
tulsi
"Why I’m asking Tom Perez to resign
Under the leadership of Tom Perez, the DNC has kowtowed to billionaires, caused a debacle in Iowa, and undermined the voter’s trust in our elections. The American people are understandably feeling confused and disheartened by a process that should be empowering"