👾 gdq-collector

Data Collection Utilities for GDQStatus

Explanation

gdq-collector is an amalgamation of services and utilities designed to be a serverless-ish backend for gdq-stats. There are 3 distinct components of the project:

gdq_collector (the python module) - Python scraping module designed to be run constantly on a compute platform like EC2 which updates a Postgres database with new timeseries and GDQ schedule data.
lambda_suite - Lambda application that caches the Postgres database to a JSON file in S3 (to reduce the load on the Database). Also includes a simple API to query recent timeseries data that doesn't appear in the cached JSON. The Lambda Suite has 3 stages (three separate configurations that are deployed independently):
- The API stage (dev/prod) - Serves recent data to a publicly facing REST endpoing
- The Caching stage (cache_databases) - Queries Postgres database and stores query results in S3 as a cache
- The Monitoring stage (monitoring) - Queries the API stage to do periodic health checks on the system

gdq_collector uses APScheduler its schedule and execute the scraping / refreshing tasks.

The Lambda applications use Zappa for deployment.

Architecture Diagram

Building / Running

gdq_collector

Note: If you're running this on an Ubuntu EC2 instance, bootstrap_aws.sh will be more useful then the following bullet list for specific setup.

Clone the repo and cd into the root project directory.
Pull down the dependencies with pip install -r requirements.txt --user
- You may wish to run aws/install.sh, as there will be necessary system dependencies to install some of the python packages.
Copy credentials_template.py to credentials.py. Fill in your credentials for Twitch, your Postgres server, and Twitch.
- You'll need to register a new Twitch application to get your clientid.
- You'll want to use this site to generate an oauth code for Twitch.
Ensure your Postgres server is running and that your credentials are valid. Create the necessary tables by executing the SQL commands in schema.sql
Run python -m gdq_collector to start the collector.
- You can run python -m gdq_collector --help to learn about the optional command line args.

lambda_suite