PUDL makes US energy data easier to access and work with. Hundreds of gigabytes of supposedly public information published by government agencies, but in a bunch of different formats that can be hard to work with and combine. PUDL takes these spreadsheets, CSV files, and databases and turns them into easy to parse, well-documented tabular data packages that can be used to create a database, used directly with Python, R, Microsoft Access, and lots of other tools.
The project currently contains data from:
- EIA Form 860
- EIA Form 923
- The EPA Continuous Emissions Monitoring System (CEMS)
- The EPA Integrated Planning Model (IPM)
- FERC Form 1
We are especially interested in serving researchers, activists, journalists, and policy makers that might not otherwise be able to afford access to this data from commercial data providers.
Just want to play with some example data? Install Anaconda (or miniconda if you like the command line) with at least Python 3.7. Then work through the following commands in a terminal:
NOTE (2019-09-13): We are in transition to using data packages and SQLite. The following instructions won't work until we release version 0.2.0, which should happen before 2019-09-16 Until then, you'll need to clone the repository to use the datapackage / SQLite version of PUDL.
$ conda create -y -n pudl -c conda-forge --strict-channel-priority python=3.7 catalystcoop.pudl jupyter jupyterlab pip
$ conda activate pudl
$ mkdir pudl-work
$ cd pudl-work
$ pudl_setup
$ pudl_data --sources eia923 eia860 ferc1 epacems epaipm --years 2017 --states id
$ pudl_etl pudl-work/settings/etl_example.yml
$ datapkg_to_sqlite --pkg_bundle_name pudl_example
$ jupyter-lab --notebook-dir=pudl-work/notebooks
This will install the PUDL Python package and its dependencies within a conda
environment named pudl
, create some local directories inside a directory
called pudl-work
, download the most recent year of data from the public
agencies, generate local data packages, load these into a local SQLite
database, and open up a folder with some example Jupyter notebooks
in your web browser. The data packages will be generated in a sub-directory in
pudl-work/datapackage
named pudl_example
(you can change this by
changing the value of pkg_bundle_name
in the ETL settings file you're
using.
NOTE: The example above requires a computer with at least 4 GB of RAM and several GB of free disk space. You will also need to download about 500 MB of data. This could take a while if you have a slow internet connection.
For more details, see the full PUDL documentation.
Find PUDL useful? Want to help make it better? There are lots of ways to contribute!
- Please be sure to read our Code of Conduct
- You can file a bug report, make a feature request, or ask questions in the Github issue tracker.
- Feel free to fork the project and make a pull request with new code, better documentation, or example notebooks.
- Make a financial contribution to support our work liberating public energy data.
- Hire us to do some custom analysis, and let us add the code the project.
- For more information check out our Contribution Guidelines
The PUDL software is released under the MIT License. The PUDL documentation and the data packages we distribute are released under the Creative Commons Attribution 4.0 License.
For help with initial setup, usage questions, bug reports, suggestions to make PUDL better and anything else that could conceivably be of use or interest to the broader community of users, use the PUDL issue tracker. on Github. For private communication about the project, you can email the team: pudl@catalyst.coop
Catalyst Cooperative is a small group of data scientists and policy wonks. We’re organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy making. Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.
Do you work on renewable energy or climate policy? Have you found yourself scraping data from government PDFs, spreadsheets, websites, and databases, without getting something reusable? We build tools to pull this kind of information together reliably and automatically so you can focus on your real work instead — whether that’s political advocacy, energy journalism, academic research, or public policy making.
- Web: https://catalyst.coop
- Newsletter: https://catalyst.coop/updates/
- Email: hello@catalyst.coop
- Twitter: @CatalystCoop