Program for scraping subbreddits and storing them in various user-chosen ways, later loading data from multiple sources into a standard format to perform analysis and NLP classification.
Saves and extracts data from multiple sources.
Using configuration files and class definitions to seperate structure and function.
Description •
Features •
Future Features •
File Descriptions •
How To Use •
Requirements •
Credits •
License
Scrape subreddits with a simple script, allowing the user to easily configure the sort and save method across various methods and databases. Load data from multiple sources into a standard format for analysis, visualization, or NLP.
- Scrapes a list of subreddits
- Saves data as CSV or SQLite to either local folder or S3
- Option to run scraper from command line with config file.
- Loading in of data in a standard format ready for modeling.
- NLP to classify subreddit by post title and description.
- Class and functions to run multiple models and compare results.
- Visualizations of most and least common words between subreddits.