In addition to the Anaconda libraries, you need to install altair, vega_datasets, category_encoders, mxnet, gluonts, kats, lightgbm, hyperopt and pandarallel.
kats requires Python 3.7 or higher.
Competition, Datasets and Evaluation
The M5 Competition aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.
Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.
Evaluation is done through Weighted Root Mean Squared Scaled Error. A detailed explanation is given in the M5 Participants Guide and the implementation is at this link.
For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.
Algorithms
Kats: Prophet
Prophet can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.
Since a Prophet model has to fit for each ID, I had to use the apply function of the pandas dataframe and instead used pandarallel to maximize the parallelization performance.
Prophet hyperparameters were tuned through 3-fold CV using the Bayesian Optimization module built into the Kats library. In this case, Tweedie was applied as the loss function. Below is the hyperparameter tuning result.
seasonality_prior_scale
changepoint_prior_scale
changepoint_range
n_changepoints
holidays_prior_scale
seasonality_mode
0.01
0.046
0.93
5
100.00
multiplicative
In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.
Kats: VAR
Since VAR is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.
GluonTS: DeepAR
DeepAR can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified through Feature Hashing.
As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as the Negative Binomial distribution.
GluonTS: DeepVAR
In the case of DeepVAR, a multivariate model, what can be set as the probability distribution of the output is limited (i.e. Multivariate Gaussian distribution), which leads to a decrease in performance.
LightGBM
I used tsfresh to convert time series into structured data features, which consumes a lot of computational resources even with minimal settings.
A LightGBMTweedie regression model was fitted. Hyperparameters were tuned via 3-fold CV using the Bayesian Optimization function of the hyperopt library. The following is the hyperparameter tuning result.
boosting
learning_rate
num_iterations
num_leaves
min_data_in_leaf
min_sum_hessian_in_leaf
bagging_fraction
bagging_freq
feature_fraction
extra_trees
lambda_l1
lambda_l2
path_smooth
max_bin
gbdt
0.01773
522
11
33
0.0008
0.5297
4
0.5407
False
2.9114
0.2127
217.3879
1023
The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.
Algorithms Performance Summary
Algorithm
WRMSSE
sMAPE
MAE
MASE
RMSE
DeepAR
0.7513
1.4200
0.8795
0.9269
1.1614
LightGBM
1.0701
1.4429
0.8922
0.9394
1.1978
Prophet
1.0820
1.4174
1.1014
1.0269
1.4410
VAR
1.2876
2.3818
1.5545
1.6871
1.9502
Naive Method
1.3430
1.5074
1.3730
1.1077
1.7440
Mean Method
1.5984
1.4616
1.1997
1.0708
1.5352
DeepVAR
4.6933
4.6847
1.9201
1.3683
2.3195
As a result, DeepAR was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.