Primary LanguageHTML

{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset0 Verdana;}

\f0\fs40 \cf0 This is the data that will be used for our final real-world testing. It consists of data scrapped from 30 different subreddits from the website www.reddit.com. Each subreddit is its own distinct comminity, but there may be overlap on what topics are discussed. This data has hard classifications, despite any overlap.\
1) Data obtaining:\
I obtained the list of subreddits to pull from by using Reddits \'91random subreddit\'92 feature until I took me to a subreddit with a large enough user base (> 100,000 subscribers with a max karma > 5000). I then recorded that subreddit into my list. \
Once I had 30 subreddits I began pulling the titles of the top posts of all time on each sub. I choose these posts, as they are the most liked of their subreddit, and thus will hopefully be representative of the subreddit\'92s culture. Please note that Reddit is not always safe for work, and some of these subreddit names/titles can be slightly lude. For the sake of randomness I have not filtered the results for content what so ever. Reddit\'92s random feature only selects recently active subreddits, so it is not a site-wide random feature. Details on its functionality can be found here: {\field{\*\fldinst{HYPERLINK "https://github.com/reddit/reddit/blob/20f061eab3385330a26dc9192707a0194f73163a/r2/r2/models/subreddit.py#L1065"}}{\fldrslt 
\f1\fs28 \cf2 \cb3 \expnd0\expndtw0\kerning0
\f1\fs28 \cf2 \cb3 \expnd0\expndtw0\kerning0
\f0\fs40 \cf0 \cb1 \kerning1\expnd0\expndtw0 \
2) List of subreddits\
ProRevenge, gunporn, FlashTV, awwwtf, wellworn,\
AnimalTextGifs, iamverybadass, shittyrobots, mildlypenis, wtfstockphotos,\
gamephysics, wellthatsucks, wiiu, conservative, thesimpsons,\
GrandtheftautoV, hiking, justfuckmyshitup, pixelart, ineeeedit,\
Warframe, keto, grilledcheese, motercycles, scottishpeopletwitter,\
Discordapp, lotr, cscareerquestions, animalsbeingbros, dogberg\