/DSI-RedditNLP

Primary LanguageJupyter Notebook

Project 3: Web APIs & NLP

Executive Summary

  • Project Question:
    • Can computer data modelling predict which of two subreddits a post "is in"?
  • Project Answer:
    • Not better than people, yet.

Description

For project 3, the goal was two-fold:

  1. Using PRAW, to collect posts from two subreddits of my choosing.
  2. then use NLP to train a classifier on which subreddit a given post came from.

Process / procedures

  • Gathered and prepared data from two subreddits using PRAW,
    • First in the example notebook, later via an exported script
  • Some data cleaning of pulled posts.
  • Created and compared two models.
    • Fed both to GridSearchCV
  • exploratory data visualization