energy-data

This repo holds scripts to collect energy data, e.g. electricity sources, carbon intensity, ...

We're starting with US electricity data from various ISOs, e.g. CAISO, MISO, PJM, etc.

Crawler

crawler holds the script to pull data from various sources.

crawl.py runs every minute via crontab, invokes individual parser for each source and store the result in a postgre database. The crawling frequency for each source is defined near top of this file.
Individual parsers are copied/derived from electricityMap's sources (MIT licensed).

Data sources

We are starting with US ISOs, which currently include:

MISO, which only has current data and is updated every five minutes.
CAISO, NEISO and NY, which has data for past few days, so we pull last day's full data daily.
PJM, which only has current day data publicly available on their website, updated every hour.
BPA, which has the data for past two days, so we pull daily.
SPP, which only has current data for the past two hours, so we pull every hour.
PR (disabled), which only has current data, but is stale and always shows 03/24/2022, so it's disabled for now.
HI (disabled), which has daily historic data, but stopped after 04/13/2022, so it's disabled for now.
ERCOT (~~US_ERCOT.py~~) and PACW which uses the new data source from EIA, and has historic data. ~~We plan to migrate other sources to EIA as well to standardize the data sources.~~ (EIA data sources had some temporary issue since June 2022 and hasn't been fixed in July 2022.)

You can find the exact list at the top of the main crawler file crawl.py.

Database

Database is currently hosted on development machine and only locally accessible (or via SSH tunnel).
Table definitions are in database/tables.
I used Jetbrains DataGrip for quick access to the database and have included the IDE settings.

TODOs:

External (read-only) data access.
Data visualization.

REST API

This is work in progress.

We implement a REST API using Flask and Flask-RESTful. The code is located in api and calls to external APIs are implemented in api/external. The Flask app is deployed using nginx + gunicorn, which are detailed in the deployment script below. You can also run locally using gunicorn directly by executing gunicorn api:create_app() in repo root, or via VSCode launch script.

Currently, we support:

Look up balancing authority based on GPS coordinates (via WattTime API).
Look up carbon intensity based on GPS coordinates and time range.
(prototype) Carbon-aware multi-region scheduler that assigns workload based on its profile and an optimization algorithm.

The full list is defined in api module.

Deployment

Deployment scripts are in deploy.

Crawler

The crawler deployment script (deploy-crawler.sh) copies the crawler code and relevant scripts to a "production" folder and installs the run-*.sh files with appropriate schedules via crontab. Currently, we run:

Database backup once per day.
Main crawler once every minute.

REST API

The REST API deployment script (deploy-rest-api.sh) copies the api code to a "production" folder and reloads supervisor, which has been set up to monitor and control the flask app via gunicorn. nginx acts as a reverse proxy to gunicorn. The entire setup process is documented in scripts/setup/install-flask-runtime.sh.

ggopalai/energy-data

energy-data

Crawler

Data sources

Database

REST API

Deployment

Crawler

REST API