Creative Commons Attribution 4.0 International


Find archived versions of the code used to clean this data: DOI

Most current data releases

The cleaned data have been updated since publication of the methods paper (early 2020 https://doi.org/10.1038/s41597-020-0483-x, see citation below) discussed throughout this repository. The most current cleaned data can be found in data/release_2020_Oct/ and corresponds to 5 full years of cleaned data spanning from 2 July 2015 through 1 July 2020 and follow the methods discussed in the paper exactly. Additionally, we have modified the method to incorporate subregion level data for the balancing authorities that provide this and provide 2 full years of cleaned data spanning 1 October 2018 through 30 September 2020 in the data/release_2020_Oct_include_subregions/ directory. Details of both updates can be found in README files at the paths above.

Overview and Citation

The raw hourly electricity demand data queried from the U.S. Energy Information Administration (EIA) show 2.2% of hourly values are missing based on data queried on 10 September 2019. We have developed a data cleaning process that consists of flagging anomalous demand values, which constitute about 0.5% of the total data. We impute missing demand values and the values flagged as anomalous using a Multiple Imputation by Chained Equations (MICE) technique. The MICE technique provides complete data sets without any extremely anomalous values.

Full documentation of the cleaning process has been published in Scientific Data.

Please consider citing:

Ruggles, T.H., Farnham, D.J., Tong, D. et al. Developing reliable hourly electricity demand data through screening and imputation. Sci Data 7, 155 (2020). https://doi.org/10.1038/s41597-020-0483-x

and the data archive:

Ruggles, Tyler H., & Farnham, David J. (2020). EIA Cleaned Hourly Electricity Demand Data (Version v1.1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3690240

The raw data were queried from the EIA database on September 10, 2019 and spans from their initial data entries on 2015-07-01 05:00:00 UTC to 2019-08-31 23:00:00 UTC. The first day of data has been removed because of significant reporting inconsistencies. The data span exactly 4 years, from the start of 2 July 2015 through the end of 1 July 2019.

Original Data Source

Original hourly demand data is collected from electric balancing authorities by the U.S. Energy Information Administration (EIA) via Form-930. Raw data is made available to the public through the EIA Open Data portal documented here:


The specific hourly demand data used is originally located here:


At the end of this README is a list of the 67 balancing authorities in the contiguous U.S. Demand data is provided for the 56 balancing authorities which have demand. The 11 balancing authorities that include generation only are denoted as such.

The EIA began collecting hourly demand data in July of 2015 and continuously publishes new values each day.

The reported demand value for each hour corresponds to the integrated mean value in megawatts over the previous hour.

Original Files

The original data from the 10 September 2019 data query is located in directory data/release_2019_Oct/original_eia_files/ and can be used to compare results or analyze other methods of cleaning.

Available Cleaned Data

The final data product is available to everyone. As the hourly demand data is a continuously growing data record in the EIA database, we plan to update this repository with new cleaned data annually.

Data is stored in csv format with each row corresponding to an hour of demand information. The date_time value is stored in UTC time.

The data can be accessed at different levels of geographic granularity ranging from the most granular balancing authority level to the contiguous U.S.

For reference, at the balancing authority level, we retain the original raw EIA demand data in the final cleaned product (raw demand (MW)). See the next section for details.

Balancing Authority Level Data

The most granular results are for the 54 balancing authorities in this directory:


At the balancing authority level, there are 4 values associated with each hourly interval.

  • raw demand (MW) - the raw demand values as queried from EIA, missing values are filled with MISSING or EMPTY
  • category - the classification of each hourly raw demand (MW) value via the anomaly screening process
  • cleaned demand (MW) - cleaned demand values with missing and anomalous values replaced by imputed values
  • forecast demand (MW) - the day ahead value forecasted by the balancing authority returned from the EIA database. These values are NOT used anywhere in the cleaning process, but are kept for others as a reference; similar to above missing values are filled with MISSING or EMPTY

Two of the balancing authorities, SEC and OVEC, have significant enough reporting problems that we do not impute cleaned data for them. For these two balancing authorities no results file are included.

Regional Level Data

Included in the table at the bottom of this README is the mapping of each balancing authority to 13 geographic regions. We provide regional aggregates corresponding to this mapping. The regional files contain raw demand (MW) and cleaned demand (MW) values for each hour. We replace cases of MISSING or EMPTY raw demand (MW) values with 0 before aggregating. The regional data is in this directory:


Interconnects Data

There are three interconnects in the contiguous U.S. electric grid, https://www.eia.gov/todayinenergy/detail.php?id=27152. Similar to the regional data files, we aggregate the balancing authority level results into the three interconnects. NOTE: the contributions from Mexico and Canada are NOT included in these interconnect files. The interconnect data is in this directory:


Contiguous U.S. Data

All 54 balancing authorities (excludes SEC and OVEC as discussed above) are aggregated to create a contiguous U.S. total. Please see this directory:


Accessing the Data / Repository Checkout

To checkout the cleaned demand data and create a simple time series distribution for your favorite balancing authority (ERCOT in the example) follow these commands if you have previously installed the python libraries pandas and matplotlib.

git clone git@github.com:truggles/EIA_Cleaned_Hourly_Electricity_Demand_Data.git
cd EIA_Cleaned_Hourly_Electricity_Demand_Data
python -i
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> df = pd.read_csv('data/release_2019_Oct/balancing_authorities/ERCO.csv', na_values=['MISSING','EMPTY'])
>>> df['date_time'] = pd.to_datetime(df['date_time'])
>>> fig, axs = plt.subplots(2)
>>> axs[0].plot(df['date_time'], df['cleaned demand (MW)'])
>>> axs[1].plot(df.loc[1000:1250, 'date_time'], df.loc[1000:1250, 'cleaned demand (MW)'])
>>> plt.show()
>>> exit()

The MICE Imputation Process

We include a directory for people interested in further details of the imputation process.


This directory compresses the two corresponding files into a zip file. These files should only be used by those interested in the details of the imputation process. See the README in that directory for more details.

Table of Acronyms and Mappings

This table shows the 67 balancing authorities in the contiguous U.S. as well as their Code used to identify their files within this repository. For a geographic map, please see:


