Note: The code and data for PowerGenome are under active development and some changes may break existing functions. Keep up to date with major code and data releases by joining PowerGenome on groups.io. And check out the growing documentation on the Wiki for helpful background information.
Power system optimization models can be used to explore the cost and emission implications of different regulations in future energy systems. One of the most difficult parts of running these models is assembling all the data. A typical model will define several regions, each of which need data such as:
- All existing generating units (perhaps grouped into a few discrete clusters within each region)
- Transmission constraints between regions
- Hourly load profiles (including new loads from vehicle and building electrification)
- Hourly generation profiles for wind & solar
- Cost estimates for new generating units
Because computational complexity and run times increase as the number of regions and generating unit clusters increases, a user might want only want to disaggregate regions and generating units close to the primary region of interest. For example, a study focused on clean electricity regulations in New Mexico might combine several states in the Pacific Northwest into a single region while also splitting Arizona combined cycle units into multiple clusters.
The goal of PowerGenome is to let a user make all of these choices in a settings file and then run a single script that generates input files for the power system model. PowerGenome currently generates input files for GenX, and we hope to expand to other models in the near future.
PowerGenome uses data from a number of different sources, including EIA, NREL, and EPA. The data are accessed through a combination of sqlite databases, CSV files, and parquet data files. EIA data on existing generating units are already compiled into a single sqlite database (see instructions for using it below). A second sqlite database has tables with new resource costs from NREL ATB, transmission constraints between IPM regions from EIA, and hourly demand within each IPM region from FERC. There are also a few data files stored in this repository:
- Regional cost multipliers for individual technologies developed by EIA (
data/cost_multipliers/AEO_2020_regional_cost_corrections.csv
). - A simplified geojson version of EPA's shapefile for IPM regions (
data/ipm_regions_simple.geojson
). - Information on user-defined technologies, which can be included in outputs. This can be used to define a custom cost case (e.g. $500/kW PV) or a new technology such as natural gas with 100% carbon capture. The CSV files are stored in the
extra_inputs
subfolders of each example system. A documentation file in that folder describes what to include in the file.
This project pulls data from PUDL. As such, it requires installation of PUDL to access a normalized sqlite database and some of the convienience PUDL functions.
catalystcoop.pudl
is included in the environment.yml
file and will be installed automatically in the conda environment (see instructions below). Catalyst Cooperative will be creating versioned data releases of PUDL, which can be accessed on Zenodo. Download the zip file from Zenodo, unzip it, and find the sqlite database under pudl_data/sqlite/pudl.sqlite
. Note that the version of catalystcoop.pudl
software may change based on the database version you use. Look on the right-hand side of the zenodo archive to see what software version was used to compile the data. If the version in your conda environment does not match the version used to compile the data, you can change it in the environment.yml
file or install a different version using conda install catalystcoop.pudl=<your_version>
.
IMPORTANT UPDATE: As of December 2021, our pinned catalystcoop.pudl
dependency has bumped from 0.3.* to 0.5.* This version bump is associated with some changes in the PUDL database structure and an increase in the Pandas dependency from 0.25.* to 1.* If you are running an older version of PowerGenome it may be easiest to remove the existing powergenome
conda environment and reinstall it.
conda remove --name powergenome --all
Alternatively, you can update your existing environment.
conda env update --file environment.yml --prune
Either way, you will need to download the new database files in steps 5/6 below and update your .env
file.
-
Clone this repository to your local machine and navigate to the top level (PowerGenome) folder.
-
Create a conda environment named
powergenome
using the providedenvironment.yml
file. If you don't already use conda, download and install miniconda. Note that resolving all the dependencies can be slow with conda, so I highly recommend that you install mamba and use it instead (just submamba
forconda
below). Mamba installation is easy and will probably take less time than sitting around while conda resolves dependencies.
conda env create -f environment.yml
or if you installed mamba:
mamba env create -f environment.yml
- Activate the
powergenome
environment.
conda activate powergenome
- pip-install an editable version of this project
pip install -e .
-
Download the PUDL database, unzip it, and copy the
/pudl_data/sqlite/pudl.sqlite
to wherever you would like to store PowerGenome data on your computer. The zip file contains other data sets that aren't needed for PowerGenome and can be deleted. Note that as of December 2021 the most recent version of this database (Data Release v3.0.0) is compatible withcatalystcoop.pudl
version 0.5.* and will not work if an earlier version is included in your conda environment. -
Download additional PowerGenome data that includes NREL ATB cost data, transmission constraints between IPM regions, and hourly demand for each IPM region. Hourly demand is for 2012 and was constructed from FERC 714 data. These files will eventually be provided through a data repository with citation information.
-
Download the renewable resource data containing generation profiles and capacity for existing and new-build renewable resources. Save and unzip this file. The suggested location for all of the unzipped files is
PowerGenome/data/resource_groups/
. These files will eventually be provided through a data repository with citation information. -
Get an API key for EIA's OpenData portal. This key is needed to download projected fuel prices from EIA's Annual Energy Outlook.
-
Create the file
PowerGenome/powergenome/.env
. To this file, addPUDL_DB=YOUR_PATH_HERE
(your path to the PUDL database downloaded in step 5),PG_DB=YOUR_PATH_HERE
(your path to the additional PowerGenome data downloaded in step 6),EIA_API_KEY=YOUR_KEY_HERE
(your EIA API key) andRESOURCE_GROUPS=YOUR_PATH_HERE
(your path to where the resource groups data from Step 6 are saved). Quotation marks are only needed if your values contain spaces. The.env
file is included in.gitignore
and will not be synced with the repository. See the SQLAlchemy documentation for examples of how to format thePUDL_DB
andPG_DB
paths (e.g.sqlite:////<entire path to the folder containing pudl file>/pudl.sqlite
, orsqlite:///C:/path/to/folder/pudl.sqlite
on Windows). If you get any errors when trying to initite the PUDL database, go back and check your path formatting against the SQLAlchemy documentation examples.
It is best practice to set up project folders outside of the cloned repository so that git doesn't track any new/changed files within the upper-level PowerGenome
folder. Try copying one of the example systems (settings file and extra inputs) and modifying it. Copy the notebooks
folder into your project folder, change the path to the settings file as needed, and run code in the notebooks. This can also be a good way to learn how data are created in PowerGenome and debug problem.
Keeping project folders separate from the cloned PowerGenome
folder will also make it easier to pull changes as they are released.
A few example systems are included under PowerGenome/example_systems
. Each system has a settings file (settings.yml
) and a folder with extra user inputs (extra_inputs
). The different example systems are not meant to be accurate for real-world analysis, so please do not blindly use the external data files included with them in your own studies!
Settings are controlled in a YAML file. An example settings file (test_settings.yml
) and folder with extra user inputs (extra_inputs
) are included in each of the example systems. Scenario options across different planning years are defined in the file test_scenario_inputs.csv
. Documentation on extra inputs is included in the folder of each example system.
A series of example notebooks are included in PowerGenome/notebooks
describe how to access different functions within PowerGenome to create resource clusters, variable generation profiles, fuel costs, hourly demand, and transmission constraints. They include a description of how the data are compiled and the settings parameters that are required for each type of data.
The outputs are all formatted for GenX we hope to make the data formatting code more module to allow users to easily switch between outputs for different power system models.
Functions from each module can be imported and used in an interactive environment (e.g. JupyterLab). Examples of how to load data in this way are included in PowerGenome/notebooks
. To run from the command line, navigate to a project folder that contains a settings file and extra inputs (e.g. myproject/powergenome
), activate the powergenome
conda environment, and use the command run_powergenome_multiple
with flags for the settings file name and where the results should be saved. Since the powergenome
package is installed in the powergenome
conda environment, you can run the command line function from anywhere on your computer (not just within the cloned PowerGenome
folder).
run_powergenome_multiple --settings_file test_settings.yml --results_folder test_system
The command line arguments --settings_file
and --results_folder
can be shortened to -sf
and -rf
respectively. For all options, run:
run_powergenome_multiple --help
A folder with extra user inputs is required when using the run_powergenome_multiple
command. The name of this folder is defined in the settings YAML file with the input_folder
parameter. Look at the files in each example system for test cases to follow.
If you have previously installed PowerGenome and the run_powergenome_multiple
command doesn't work, try reinstalling it using pip install -e .
as described above. If you downloaded the custom PUDL database before May of 2020, some errors may be resolved by downloading a new version.
PowerGenome is released under the MIT License. Most data inputs are from US government sources (EIA, EPA, FERC, etc), which should not be subject to copyright in the US. Hourly FERC demand data has been cleaned using techniques developed by Tyler Ruggles and David Farnham, and allocated to IPM regions using methods developed by Catalyst Cooperative. Hourly generation profiles for wind and solar resources were created by Vibrant Clean Energy and provided without usage restrictions. All PowerGenome data outputs are released under the CC-BY-4.0 license.
Contributions are welcome! There is significant work to do on this project and additional perspective on user needs will help make it better. If you see something that needs to be improved, open an issue. If you have questions or need assistance, join PowerGenome on groups.io and post a message there.
Pull requests are always welcome. To start modifying/adding code, make a fork of this repository, create a new branch, and submit a pull request.
All code added to the project should be formatted with black. After making a fork and cloning it to your own computer, run pre-commit install
to install the git hook scripts that will run every time you make a commit. These hooks will automatically run black
(in case you forgot), fix trailing whitespace, check yaml formatting, etc.