Code for data extraction, preprocessing, and SSF model training and evaluation used in He et al. Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances.
The code is compatible with Python 3.6 and the following packages:
- numpy: 1.19.0
- pandas: 0.24.2
- joblib: 0.15.1
- pickle: 4.0
- scipy: 1.5.0
- pytorch: 1.2.0
- sklearn: 0.23.1
- xgboost: 1.0.2
- pytables: 3.5.2
- Clone the repo
- Create virtual environments and install the necessary python packages listed above
- Load raw data from the google drive folder (https://drive.google.com/drive/folders/1qeDOWDULW-F8HnLYPEzWe8tCdtF4xQYT?usp=sharing) and save it to the folder named "data"
- Revise configure file (cfg_target.py) to adapt to the required settings
- Data loading and preprocessing:
- Execute load_data.py to load the subset of data needed in generating forecasts
- Execute run_preprocess.py to preprocess covariates and target variable separately
- Alternatively, one can execute run_preprocess_map.py to zscore covariates and convert them to squared maps
- Execute create_covariates_pca.py to concatenate data
- Execute create_datasets.py to create training-validation sets and training-test sets
- Hyperparameter tuning: execute run_random_search.py to find the best parameter by random search
- Generate forecasts: execute main_experiments.py to train all the models and generate forecasts on test sets
- Evaluate forecasts: execute run_evaluation.py to evaluate the forecasting performance on both training and test sets
- cfg_target.py: configure file with all parameters from users
- load_data: script for a subset of data required by configure file (cfg_target.py)
- run_preprocess.py: script for data preprocessing
- run_preprocess_map.py: script for preprocessing covariates and converting them to squared maps
- create_covariates_pca.py: script for concatenating PCs from all climate variables (covariates)
- create_datasets.py: script for creating training-validation sets and training-test sets
- run_random_search.py: script for hyperparameter tuning via random search
- main_experiments: script for running experiments for all models on training and test sets
- run_evaluation.py: script for evaluate the performance of forecasting models on training and test sets
- data_load: folder contains functions for data loading
- preprocess: folder contains functions for data preprocessing
- hyperparameter_tuning: folder contains functions for random search
- forecasting: folder contains scripts for training forecasting models and evaluating on test sets
- evaluation: folder contains functions for evaluation
- model: collection of models implemented in experiments
- utils:utility functions