/reddit-nlp

Scrape subreddits, multiple save and load options, automation

Primary LanguageJupyter Notebook

Subreddit Classification Using NLP

Program for scraping subbreddits and storing them in various user-chosen ways, later loading data from multiple sources into a standard format to perform analysis and NLP classification.
Saves and extracts data from multiple sources.
Using configuration files and class definitions to seperate structure and function.

DescriptionFeaturesFuture FeaturesFile DescriptionsHow To UseRequirementsCreditsLicense

Description

Project Purpose:

Scrape subreddits with a simple script, allowing the user to easily configure the sort and save method across various methods and databases. Load data from multiple sources into a standard format for analysis, visualization, or NLP.

Features

  • Scrapes a list of subreddits
  • Saves data as CSV or SQLite to either local folder or S3
  • Option to run scraper from command line with config file.
  • Loading in of data in a standard format ready for modeling.
  • NLP to classify subreddit by post title and description.
  • Class and functions to run multiple models and compare results.
  • Visualizations of most and least common words between subreddits.

License

MIT