/reddit-upvote-modeling

Predicting how many upvotes a comment will get, given the comment text, user history, sub-reddit and thread details.

Primary LanguageJupyter Notebook

Predicting Reddit Comment Upvotes

Contributors:

  • Adam Reevesman
  • Gokul Krishna Guruswamy
  • Hai Le
  • Maximillian Alfaro
  • Prakhar Agrawal

Resources in this repository

The code for this project is divided into several notebooks, each of which regards a part of the workflow.

The finalized notebooks that combine into our entire work includes:

Objective

Original: To predict how many upvotes a comment will get, given the comment text, user history, sub-reddit and thread details.

Next Step: Improve current model performance.

Data Source

We use 2 sources of data:

Models and Metrics

We attempted linear and nonlinear regression models and compared their Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R2 values. The Random Forest had the lowest MAE, around 8, which suggests that on average, the model is off by about 8 upvotes.

RMSE penalizes large errors more heavily. The magnitudes of RMSE among our models suggests that they have lots of large errors.

Reference Papers/Write-ups

FAQ about reddit