Below you can find a outline of how to reproduce my solution for the M5 Forecasting - Accuracy competition. If you run into any trouble with the setup/code or have any questions please contact me at matthias.anderer@googlemail.com
The notebooks were run as Kaggle kernels / Colab notebooks - so we actually never executed the code locally.
- m5-simple-fe-evaluation.ipynb == feature preprocessing notebook
- m5-final-XX.ipynb == bottom level (lvl12) train and predict notebook
- m5-nbeats-toplvl.ipynb == top level (lvl1-5) train and predict notebook
- m5-alignandsubmit.ipynb == notebook to align bottom level with top level results
- gluonts-20200710T072218Z-001.zip == modified glutonts version (added lookahead to Trainer)
1)Make sure to add gluonts to the path of the N-BEATS notebook
unzip gluonts-20200710T072218Z-001.zip
- Next you have to load the competition data from kaggle - assuming you use cli
(Again make sure to adapt the paths to your environment)
mkdir input
cd input
kaggle competitions download -c m5-forecasting-accuracy
cd ..
Before running please change the path settings in the m5-simple-fe-evaluation.ipynb notebook to match your local environment
(Search for "Load Data" in notebook)
jupyter notebook m5-simple-fe-evaluation.ipynb
The resulting files grid_part_1.pkl, grid_part_2.pkl, grid_part_3.pkl will be created in the working directory and are the input for the next step
For each loss_multiplier that shall be tested the notebook m5-final-XX.ipynb has to be run
Before running the LOSS_MULTIPLIER variable has to be set to the desired value.
(The values used for ensembling the final submission to the competition were: 0.9 , 0.93, 0.95, 0.97, 0.99 - so to reproduce the results you will need to run the notebook once for each parameter)
Before running please change the path settings in the m5-final-XX.ipynb notebook to match your local environment
Search for "#PATHS for Features" in notebook - here the path has to match the output pwd of the DATA PROCESSING STEP
jupyter notebook m5-final-XX.ipynb
The notebook will create temporary model and test data files that you should delete after each run
rm *.pkl
rm *.bin
The notebook will create a "submission_vX.csv" file that contains the bottom level predictions for the chosen loss multiplier
FOR THIS TO WORK LOCALLY YOU WILL HAVE TO RENAME THIS OUTPUTFILE TO A UNIQUE NAME BEFORE RE-RUNNING THIS STEP WITH A DIFFERENT MULTIPLIER OTHERWISE IT WILL BE OVERWRITTEN
- The model will try to install the required mxnet version as well as pydantic and ujson - if you do not want this to happen comment the resprective pip install lines in the notebook before running.
- Before running please make sure that the gluonts package path matches you local setup: Search for "package_path" in the notebook
- Before running please make sure that the m5_input_path matches your local setup: Search for "m5_input_path" in the notebook
- Before running please make sure that the data paths match your local setup: Search for "Load data" in the notebook
- Before running please make sure that the output paths match your desired local setup: Search for "to_csv" in the notebook
- You will need a CUDA/GPU environment for running this notebook - please make sure that the mxnet pip install matches your Cuda and mkl setup (cu101mkl is for Cuda 10.1 with mkl)
( In the competition we ran the NBEATS top level twice: Once with the configuration in the notebook which is referred to as "nbeats_toplvl_forecasts2.csv" and once with the setting: "epochs=10" in the Trainer object: Search for "Trainer(" in the notebook. The latter configuration is referred to as "nbeats_toplvl_forecasts1.csv" )
( IF YOU WANT TO USE DIFFERENT NBEATS REFERENCE PREDICTIONS MAKE SURE TO HAVE UNIQUE OUTPUT FILENAMES OTHERWISE THE OUTPUT FILE WILL BE OVERWRITTEN ON RE-RUN )
jupyter notebook m5_nbeats_toplvl.ipynb
Finally the bottom level forecasts have to be aligned with the top level forecasts
- Before running please make sure that the paths for the NBEATS reference predictions match your local setup (i.e. the output files you created in the top level model step) : See "nbeats_pred01_df" and "nbeats_pred02_df" in the notebook
- Before running please make sure that the paths for the bottom level model predictions match your local setup (i.e. the output files you created in the bottom level model step) : See "if BUILD_ENSEMBLE:"
- Before running please make sure that the paths for the competition data match your local setup: Search "validation_gt_data = pd.read_csv" in the notebook
jupyter notebook m5-alignandsubmit.ipynb