singularity-energy/open-grid-emissions

Missing data due to fuel code mismatch between EIA-860 and EIA-923

Closed this issue · 4 comments

We've noticed large fluctuations in the fleet emission rate for petroleum in MISO across each of the years. This issue seems to have existed prior v0.2.0, but we never noticed it before.

2019: 14,726,036,229.73 lbCO2 / 566,287 MWh = 25,928.32 lb/MWh
2020: 18,206,127,107.72 lbCO2 / 7,427,800 MWh = 2,451.08 lb/MWh
2021: 22,809,325,967 / 28,720,749 = 794 lb/MWh

Steps:

  • Implement check to identify large differences between fleet emission rates across years (how widespread is this issue?)
  • This suggests that we should probably implement automated anomaly detection to catch these issues in the future.

My guess would be that the source of this issue is most likely to be a result of:

  • Missing data in CEMS that is not getting flagged for imputation
  • An issue with the data crosswalking between EIA and EPA
  • Some issue with the gross to net conversion (although no widespread issues were flagged in the pipeline).

So after digging into this in more detail, this specific issue appears to result from a specific plant, and may be a symptom of several broader issues.

As we showed above, in 2019, total petroleum fleet emissions were calculated as ~14.7 billion lb. Of this total, approximately 14 billion lb comes from a single plant ("Columbia", id 8023). There are a couple of issues with this:

  • Plant 8023 is actually a coal plant (it uses a small amount of DFO for startup)
  • While CEMS reports over 5.8 million MWh of generation from this plant in 2019, EIA only reports about 3 thousand MWh of generation.

This suggests that there are two big issues:

  1. There is something wrong with our plant primary fuel identification
  2. Our gross to net generation conversion is "accurate" in that it is converting 5.8 million MWh to 3,000 MWh, but we should not be allowing this when there is that big of a discrepancy in gross to net. This is a situation where the algorithm should be defaulting to a more reasonable default gross to net ratio.

What else do we know about what's going on?

  • The pipeline is defaulting to use CEMS data for this plant
  • The cleaned CEMS data for this plant is being assigned a fuel code of DFO, which is incorrect.

I'm going to poke around with this plant and see what I can find out.

So it looks like the issue may be with the pudl.analysis.allocate_net_gen.allocate_gen_fuel_by_generator_energy_source() module. It appears that it is dropping some data.

Here's what the raw EIA-923 data says is the generation and fuel consumption of plant 8023 in 2019:
image

And here's what the output of the generation fuel allocation is:
image

It looks like maybe the issue is with the RC fuel code getting dropped or not merging correctly.

To be continued...

After further investigation, it looks like the issue is partially in the raw EIA-860 data, and partially with our pudl allocate_net_gen function.

Although for plant 8023, EIA-923 reports most of the annual fuel consumption in 2019 associated with the RC energy source code, in EIA-860, none of the energy source codes associated with this plant are RC (in fact, in EIA-860, both energy_source_code_1 and energy_source_code_2 are SUB which seems strange...). This means that when our allocate_net_gen function is using the energy_source_code as a merge key, it is not finding RC in the gens table and thus dropping all of this data.
I'm not sure how widespread this issue might be yet, but it seems like there may be a relatively simple patch we could implement in the allocate_net_gen function: for each plant-pm, we could check whether all of the energy source codes reported in the gf table also exist in the the gens table. If not, we could add any missing energy source codes to the gens table so that the merge doesn't drop this data (although if we are doing an "outer" merge, should this matter?)

After investigating further, in 2019, this bug affected 49 individual plants and is leading to up to ~52,000,000 MWh and 528,000,000 mmbtu of fuel (primary coal and petroleum) being dropped from the dataset. The cause of this is that a fuel code that exists in the EIA-923 generation and fuel table is not listed as one of the fuel codes in the EIA-860 generator table, and we did not previously catch this in the pudl allocate_net_gen code.