HR-Extreme is a dataset containing high-resolution feature maps of physical variables for evaluating the performance of cutting-edge models on extreme weather prediction. This dataset focuses on 17 types of extreme weather events spanning the year 2020, based on HRRR data. The dataset is designed to support researchers in weather forecasting, ranging from physical methods to deep learning techniques. Full paper link(under review)
The code for constructing the dataset is available on GitHub:
The dataset is organized into two directories:
202001_202006
: Data from January 2020 to June 2020202007_202012
: Data from July 2020 to December 2020
Each directory contains the dataset in WebDataset format, following Hugging Face recommendations. Every 10 .npz
files are aggregated into a single .tar
file, named sequentially as i.tar
, where i
is an integer (e.g., 0001.tar
).
To generate a complete index file, use the script make_datasetall.py
with the start date and end date. For example:
python make_datasetall.py 20200101 20200630