singularity-energy/open-grid-emissions

Update OGE to work with the next PUDL release / 2021 data

grgmiller opened this issue · 2 comments

The latest release of PUDL includes many updates and potentially breaking changes that we will need to address. The changes I've flagged so far include:

  • update the environment to be compatible with pudl
  • Update code/warnings to allow 2021 data
  • The EPA-EIA crosswalk is now integrated into pudl. We should look into whether we still need to separately download this
  • prime_mover_code has moved from generators_entity_eia to generators_eia860 since it is no longer considered a static attribute
  • Check for updates to the operational_status_code encoding
  • Several attributes were moved from plants_entity_eia to plants_eia860, including balancing_authority_code_eia, balancing_authority_name_eia, grid voltage columns, and iso codes
  • grid_voltage_kv was renamed to grid_voltage_1_kv
  • Check balancing_authorities_eia for changes to BA encoding, including changes to PACW and PACE
  • Several columns in CEMS were renamed: unitid -> emissions_unit_id_epa. facility_id was dropped.
  • Missing values in CEMS are no longer replaced with zeros - no longer need method to re-create these missing values.
  • May no longer need to convert plant_id_eia in CEMS to plant_id_epa before converting to plant_id_eia
  • There were changes made to allocate_net_gen before mergining - we should double check that everything still works.

Additional to dos:

  • Update download functions to grab 2021 data from zenodo
  • Double check load data functions for environmental tables to make sure workbook formats are the same for 2021.
  • Consider deleting load_data.crosswalk_epa_eia_plant_ids() if it is not used by any functions anymore. pudl is doing this crosswalk but not including all manual CW.
  • Double check that we don't have any dependencies on eGRID since the 2021 version is not yet published
  • Double check all of the manual cleaning of EIA-930 data is updated for year 2021 data
  • Go through all files in data/manual to update for 2021 if necessary
  • Test that co2-eq functions are using AR6 for 2021 data
  • Update the environment name after testing is done, and change pudl dependency

Meta to-do:

  • Create a checklist of everything we need to check / update each time a new year of data is released
  • Consider whether it is worth running the PUDL pipeline locally in the future instead of waiting for PUDL release.

First test the pipeline with year 2020 data after the update to make sure everything is working as expected and there are no major changes to the 2020 data, then run for 2021