/reddata

Playing with reddit public data set

Primary LanguageJavaScriptMIT LicenseMIT

reddata

I use reddit comments dataset to compute related subreddits.

Basic principle of this recommender is "redditors who posted to this subreddit also post to ...". In math terms, I'm just computing Jaccard index

The code is not supposed to be looked at, but check this early results:

related to /r/programming

related to /r/gamedev

related to /r/vim

related to /r/visualization

related to /r/Seattle

related to /r/nyc

While results looks very promissing for subreddits with less than one million subscribers, more popular subreddits unfortunately get their results saturated with other popular subreddits:

related to /r/books

If you have an idea how to fix this please let me know :)

license

MIT