This repository provides the code for TSBench, the most comprehensive evaluation of forecasting methods to this data. TSBench is presented in the following paper:
The code that is found in this repository provides the following features:
- It provides well-structured code to benchmark forecasting methods on AWS Sagemaker. The repository readily allows the evaluation of 13 time series forecasting methods included in GluonTS across 44 publicly available datasets. Evaluations are performed with respect to multiple performance measures and all forecasts generated for the test set are stored.
- The evaluations of these 13 forecasting methods on all 44 datasets are publicly available on the Registry of Open Data on AWS.
- It allows to evaluate ensembles of models quickly by using the stored forecasts on the test sets.
- It enables to train surrogate models that learn from the performance of forecasting methods across datasets. The surrogate models then allow to select models and hyperparameters for unseen datasets while considering multiple objectives (e.g. accuracy and latency).
Prior to installation, you may want to install all dependencies (Python, CUDA, Poetry). If you are running on an AWS EC2 instance with Ubuntu 20.04, you can use the provided bash script:
bash bin/setup-ec2.sh
In order to use the code in this repository, you should first clone the GluonTS repository and then go into the directory of this project:
git clone git@github.com:awslabs/gluon-ts.git
cd gluon-ts/src/gluonts/nursery/tsbench
Then, in the root of the repository, you can install all dependencies via Poetry:
poetry install
Note: TSBench does not currently run on Apple Silicon devices as multiple dependencies are unavailable.
The main way of interaction with the code in this repository should be the tsbench
CLI. After
running poetry install
and poetry shell
in the root of the TSBench repository, you can get an
overview by running the help command:
tsbench --help
In the following in-depth examples, we will also heavily use the CLI.
As noted above, the evaluation of 13 forecasting methods (along with various hyperparameter settings) across all 44 datasets included in this repository are publicly available on the Registry of Open Data on AWS. You can easily download these evaluations using the CLI:
tsbench evaluations download
This command only downloads the performance metrics (i.e. accuracy, latency of forecasts, etc.) and does not download the generated forecasts. This allows you to download only ~20 MiB of data. If you actually want to access the forecasts, you can pass an additional flag which will download roughly 600 GiB of data:
tsbench evaluations download --include_forecasts
All data that is downloaded will be available at ~/evaluations
. You can customize this path by
setting --evaluations_path
. This will require you to manually set this path when executing plenty
of other commands though.
The notebooks in the examples folder will guide you through the usage of locally available evaluations. See below for more context on the content of these example notebooks.
One of the main purposes of this repository is to easily benchmark forecasting methods on various datasets. In the following, we want to guide you through the entire process.
First, you should set up an EC2 instance (since you will be dealing with plenty of network traffic). In order to use all functionalities, make sure to attach the following permission policies to its IAM account:
AmazonEC2ContainerRegistryFullAccess
AmazonSageMakerFullAccess
AmazonS3FullAccess
On this EC2 instance, you should now clone the repository and install its dependencies.
Before evaluating forecasting methods, you need to prepare the benchmark datasets. To download all
datasets that are bundled in TSBench, you need to have a Kaggle account and you need to add your
API token at ~/.kaggle/kaggle.json
. Prior to using the tsbench
CLI, you have to download some
datasets via the Kaggle API. For this, run the following script (which potentially requires you to
go into your browser to accept terms of use):
bash bin/download-kaggle.sh
Afterwards, you can run the following commands (assuming that you have executed poetry shell
):
# Download and preprocess all datasets
tsbench datasets download
# Upload locally available datasets to your S3 bucket
tsbench datasets upload --bucket <your_bucket_name>
Remember the name of the bucket that you used here. You will need it later!
As evaluations are scheduled on AWS Sagemaker, you will need to ensure that the IAM account that runs the evaluations can access the data in the bucket that you just created. For this, create an IAM role which has (at least) the following policies attached:
AmazonSageMakerFullAccess
AmazonS3FullAccess
As training jobs on AWS Sagemaker run in Docker containers, you will need to build your own and
upload it to the ECR registry. For this, you must first create an ECR repository named tsbench
.
Then, you can build and upload it by using the following utility script:
bash bin/build-container.sh
At this point, you can finally schedule your evaluations. In the configs/benchmark directory, you will find the full grid search over methods and datasets that are included in the publicly available evaluations. If you want to re-run them, you can simply execute the following:
tsbench evaluations schedule \
--config_path configs/benchmark \
--sagemaker_role <arn_of_your_sagemaker_role> \
--experiment <your_experiment_name> \
--data_bucket <your_bucket_name> \
--output_bucket <your_bucket_name>
This will schedule all evaluations on your AWS Sagemaker account and groups them by the experiment name. After they have completed successfully, you can download them into your local evaluation directory:
tsbench evaluations download --experiment <your_experiment_name>
Again, this only downloads the performance metrics. If you actually want to use the forecasts, add
the --include_forecasts
flag.
Note: As this command may run for a long time (depending on the number of evaluations you run and
your AWS Sagemaker quotas), it is a good idea to run this command in a tmux
session.
If you want to run evaluations for your own dataset, you can easily add it to the registry by editing the dataset definitions file. Consult the implementation of the included datasets to get an idea of how to add your own.
By passing a unique key for your dataset to the register
class decorator, your dataset is readily
usable across all CLI commands and scripts.
In order to use time series forecasting methods that are not included in this repository, you can edit the model definitions file. In case you attempt to use newly published estimators from GluonTS, edit the dependencies in pyproject.toml to use a different GluonTS version.
If your new model defines non-standard hyperparameters (i.e. hyperparameters other than the
training time or the learning rate), you should add them as options to the
evaluation script. The option should take the form
--<model_key>_<hyperparameter_name>
.
The examples directory provides additional usage examples of the code in this repository:
browse-offline-evaluations.ipynb
explains how to access evaluations that have been downloadedtrain-a-recommender.ipynb
discusses how you can train a recommender which is able to provide multi-objective recommendations of forecasting models and hyperparameters for unseen datasetsevaluate-ensemble-performance.ipynb
goes through the process of simulating the performance of an ensemble using locally available forecastsanalyze-surrogate-performance.ipynb
shows how you can assess the performance of surrogate models which learn from the offline evaluations