Code for reproducing the results in Hwang et al. Improving Subseasonal Forecasting in the Western US with Machine Learning. Please execute all instructions, scripts, and notebooks from the base directory of the repository, i.e., the directory in which README.md is located.
The code was tested using Python 2.7 on Linux and macOS, and Anaconda 2.3.0. It makes use of the following Python 2.7 packages:
- pygrib: 2.0.2
- netCDF4: 1.2.4
- jpeg: 9b
- pandas: 0.20.3
- jupyter: 1.0.0
- scipy: 0.19.1
- py-earth: 0.1.0
- r: 3.1.2
- cdo: 1.8.2
- hdf5: 1.8.18
- pytables: 3.4.2
installed via the commands
conda install --channel https://conda.anaconda.org/conda-forge pygrib
conda install netCDF4
conda install jpeg
conda install pandas
conda install jupyter
conda install scipy
pip install https://github.com/jcrudy/py-earth/archive/master.zip
conda install -c r r
conda install -c conda-forge cdo
conda install -c conda-forge hdf5=1.8.18
conda install -c conda-forge pytables
After cloning the repository, please execute the following steps in preparation for generating forecasts.
- The folder data/fcstrodeo_nctemplates contains the template files provided by NOAA and the contest organizers to generate the forecasts. Within data, create two additional subfolders data/dataframes and data/forecast/cfsv2_2011-2018.
- Download the SubseasonalRodeo dataset from https://doi.org/10.7910/DVN/IHBANG and place it in data/dataframes.
- Download the Reconstructed Precipitation and Temperature CFSv2 Forecasts for 2011-2018 from https://doi.org/10.7910/DVN/CEFZLV. Place the files cfsv2_re-contest_tmp2m-56w.h5, cfsv2_re-contest_tmp2m-34w.h5, cfsv2_re-contest_prate-56w.h5 and cfsv2_re-contest_prate-34w.h5 in data/dataframes. Place the other files in data/forecast/cfsv2_2011-2018.
- For each of the four forecasting tasks with ground-truth identifier in {“contest_tmp2m”, “contest_precip”} and target horizon in {“34w”, “56w”}, create the feature and target data matrices used by several of our methods by executing the Jupyter notebook create_data_matrices.ipynb with
gt_id
set to equal to the ground-truth identifier andtarget_horizon
set equal to the target horizon.
To generate MultiLLR forecasts for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”}, a target horizon in {“34w”, “56w”}, and all target dates, execute the Jupyter notebook batch_2011-2018_backward_stepwise.ipynb with gt_id
set to equal to the ground-truth identifier and target_horizon
set equal to the target horizon. This notebook, for each target date in 2011-2018, generates MultiLLR forecasts for the target date using the script 2011-2018_backward_stepwise.py. Since each target date job is long-running, we recommend submitting these jobs to a cluster by setting run_locally
to False
and setting batch_script
to your personal batch cluster submission script. Alternatively, you can run the jobs locally and sequentially by setting run_locally
to True
(in which case the setting of batch_script
is irrelevant).
To generate the AutoKNN forecasts for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”} and a target horizon in {“34w”, “56w”},
- Execute the Jupyter notebook knn_step_1-compute_similarities.ipynb with
gt_id
set equal to the ground-truth identifier andtarget_horizon
set equal to the target horizon. This will compute and save the similarities between every pair of dates in the dataset. - Execute the Jupyter notebook knn_step_2-get_neighbor_predictions.ipynb with
gt_id
set equal to the ground-truth identifier andtarget_horizon
set equal to the target horizon. This will compute and save the predictions of the most similar viable neighbors of each target date in the dataset. - Execute the Jupyter notebook 2011-2018_regression.ipynb with
gt_id
set equal to the ground-truth identifier andtarget_horizon
set equal to the target horizon. This will carry out the AutoKNN weighted local least-squares regression onto the top nearest neighbor predictions, an intercept, and fixed lagged measurements and save forecasts for all 2011-2018 target dates.
To recreate the debiased CFSv2 skills for 2011-2018, run gen_cfsv2_skills_2011-2018.py.
To generate ensemble forecasts based on the predictions of MultiLLR, AutoKNN, and reconstructed debiased CFSv2, for a ground-truth identifier in {“contest_tmp2m”, “contest_precip”} and a target horizon in {“34w”, “56w”}, execute the Jupyter notebook ensemble_backward_stepwise_and_knn_regression.ipynb with gt_id
set equal to the ground-truth identifier and target_horizon
set equal to the target horizon.
After completing all of the previous steps, executing the scripts table_skills_contest_year_all_methods.ipynb and table_skills_by_year_all_methods.ipynb will generate LaTeX tables corresponding to Tables 1 and 2 in the paper.
- experiments_util.py: utility functions supporting experiments.
- fit_and_predict.py: functionality for fitting models and forming predictions.
- knn_util.py: supporting functionality for knn notebooks.
- skill.py: supporting functionality for evaluating predictions.
- stepwise_util.py: supporting functionality for stepwise regression.