This project is a work-in-progress. It seems to predict subreddits fairly well - an average F1 score of 0.25 using the top 100 subreddits.
However, I'm planning on a major rewrite using a neural net.
I'm using the dataset here: https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/