Power system optimization models can be used to explore the cost and emission implications of different regulations in future energy systems. One of the most difficult parts of running these models is assembling all the data. A typical model will define several regions, each of which need data such as:
- All existing generating units (perhaps grouped into a few discrete clusters within each region)
- Transmission constraints between regions
- Hourly load profiles (including new loads from vehicle and building electrification)
- Hourly generation profiles for wind & solar
- Cost estimates for new generating units
Because computational complexity and run times increase as the number of regions and generating unit clusters increases, a user might want only want to disaggregate regions and generating units close to the primary region of interest. For example, a study focused on clean electricity regulations in New Mexico might combine several states in the Pacific Northwest into a single region while also splitting Arizona combined cycle units into multiple clusters.
The goal of PowerGenome is to let a user make all of these choices in a settings file and then run a single script that generates input files for the power system model. PowerGenome currently generates input files for GenX, and we hope to expand to other models in the near future.
This project pulls data from PUDL. As such, it requires installation of PUDL to access a normalized sqlite database and some of the convienience PUDL functions.
Installation instructions are limited at the moment, and will be expanded on in the near future
- Clone this repository to your local machine and navigate to the top level (PowerGenome) folder.
- Create a conda environment named
powergenome
using the providedenvironment.yml
file.
conda env create -f environment.yml
- Activate the
powergenome
environment.
conda activate powergenome
- pip-install an editable version of this project
pip install -e .
- Build the local database following PUDL instructions. Skip the first step of creating the pudl conda environment since you have already installed pudl in the powergenome environment. I recommend modifying the example
pudl_data
line as follows:
pudl_data --sources eia923 eia860 ferc1 epaipm --years 2011 2012 2013 2014 2015 2016 2017
- Be sure to edit the
etl_example.yml
file to include all years of 860/923 data. Remove all years fromepacems
. - Once you have created the sqlite database, change the
SETTINGS["pudl_db"]
parameter inpowergenome/params.py
to match the path on your computer.
Settings are controlled in a YAML file. The default is pudl_data_extraction.yml
.
The code is currently structured in three main modules - generators.py
, transmission.py
, and load_profiles.py
. Functions from each can be imported and used in an interactive environment (e.g. JupyterLab). To run from the command line, activate the pudl
conda environment, navigate to the powergenome
folder, and run
python extract_pudl_data.py
There are currently 3 arguments that can be used after the script name:
- -sf (--settings_file), the name of an alternative settings YAML file.
- -rf (--results_folder), the name of a results subfolder to save files in. If no subfolder is specified the default is to create one named for the current datetime.
- -a (--all_units), if information on all units should be saved in a separate CSV file. The default value is
True
.
An example using all three options would be:
python extract_pudl_data.py -sf pudl_data_extraction_CA_present.yml -rf CA-present -a True