singularity-energy/open-grid-emissions

Hourly data not validated for overlapping data

Closed this issue · 1 comments

It looks like in data_cleaning.combine_plant_data() a argument validate was added with a default value of False. When set to False, this function does not run validation.ensure_non_overlapping_data_from_all_sources().

In the data pipeline, step 15, there is no validate argument specified when combining the hourly data, which means by default this data is not being checked for overlapping data.

This argument should always be set to True by default.

It looks like this validation check was being run for the monthly data but not the hourly data because the validatation check compares data by subplant id, which is missing from the shaped EIA data. Since it is run for the monthly data export, there should be no duplication of data after that point.