Time series are ubiquitous in today’s data-driven world. Given historical data, time series forecasting (TSF) as a long-standing task aims to predict future multi-step data. It has a wide range of applications, including but not limited to traffic flow estimation, energy management, economics investment, weather forecasting, and disease predictions.
This repository collects public benchmark datasets for Time Series Forecasting (TSF), in which all the datasets are well pre-processed and can be used directly. According to whether to consider the correlations between sensors (nodes), we roughly divide them into two categories as follows:
(1) Without Considering Correlation Between Sensors [Google Drive] [Baidu Wangpan(Password: )];
(2) Considering Correlation Between Sensors [Google Drive] [Baidu Wangpan(Password: )].
- ETT (Electricity Transformer Temperature):
This dataset consists of two
hourly-level
datasets (ETTh) and two15-minute-level
datasets (ETTm). Each of them contains one oil and six load features of electricity transformers from July 2016 to July 2018. - Traffic:
This dataset from the California Department of Transportation describes the road occupancy rates between 0 and 1. It contains the
hourly
data recorded by the sensors of San Francisco Bay area freeways from 2015.01 to 2016.12. - Electricity:
This dataset collects the
hourly
electricity consumption in KWh of 321 clients from 2012 to 2014. Note that, the raw dataset records every15 minutes
from 2011 to 2014 while some dimensions of data are equal to 0. Thus, the records in 2011 are removed. - Exchange-Rate:
This dataset collects the
daily
exchange rates of 8 countries, including Australia, British, Canada, Switzerland, China, Japan, New Zealand and Singapore, from 1990.01 to 2016.12. - Weather:
This dataset includes 21 indicators of weather, such as air temperature, and humidity. Its data is recorded every
10 min
for 2020 in Germany. - ILI (Influenza-like Illness):
This dataset describes the ratio of patients seen with influenza-like illness and the total number of the patients. It includes the
weekly
data from the Centers for Disease Control and Prevention of the United States from 2002 to 2021.
-
PEMS
PEMS03, PEMS04, PEMS07 and PEMS08 are collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS) in real time every 30 seconds. The collected data is aggregated to 5 minutes, which means there are 12 points in the flow data for each hour. They can be downloaded from this Repo.
-
PEMS03: It contaions three months of statistics on traffic flow ranging from Sept. 1st 2018 to Nov. 30th 2018, including 358 sensors. The total number of observed traffic data points is 26,208.
-
PEMS04: It contaions two months of statistics on traffic flow, traffic speed and traffic occupancy rate ranging from Jan. 1st 2018 to Feb. 28th 2018, including 307 sensors. The total number of observed traffic data points is 16,992.
-
PEMS07: It contaions three months of statistics on traffic flow ranging from May 1st 2017 to Aug. 31st 2017, including 883 sensors. The total number of observed traffic data points is 28,224.
-
PEMS08: It contaions three months of statistics on traffic flow, traffic speed and traffic occupancy rate ranging from July 1st 2016 to Aug. 31st 2016, including 170 sensors. The total number of observed traffic data points is 17,856.
-
-
PEMS-BAY: This dataset is collected by California Transportation Agencies (CalTrans) Performance Measurement System (PeMS), and contains six months of statistics on traffic speed, ranging from Jan 1st 2017 to May 31th 2017, including 325 sensors in the Bay area. The total number of observed traffic data points is 16,937,179.
-
METR-LA: This dataset from loop detectors in the highway of Los Angeles County records four
months
of statistics on traffic speed, ranging from Mar 1st 2012 to Jan 30th 2012, including 207 sensors on the highways of Los Angeles County. The total number of observed traffic data points is 6,519,002.
-
PEMS
Dataset PEMS03 PEMS04 PEMS07 PEMS08 # of nodes 358 307 883 170 # of timesteps 26,208 16,992 28,224 17,856 # Granularity 5 mins 5 mins 5 mins 5 mins # Start time Sept. 1st 2018 Jan. 1st 2018 May 1st 2017 July 1st 2016 # End time Nov. 30th 2018 Feb. 28th 2018 Aug. 31st 2017 Aug. 31st 2016 Signals F F,S,O F F,S,O In column titled “Signals”, F represents traffic flow, S represents traffic speed, and O represents traffic occupancy rate.
To evaluate the performance of models, there are three commonly used metrics, including RMSE, MAE, and MAPE.
Suppose
- Mean Absolute Error (MAE):
- Root Mean Square Error (RMSE):
- Mean Absolute Percentage Error (MAPE):
- [1]. https://github.com/laiguokun/multivariate-time-series-data
- [2]. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
- [3]. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
- [4]. Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting
- [5]. Are Transformers Effective for Time Series Forecasting?
- [6]. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting
- [7]. Learning Dynamics and Heterogeneity of Spatial-Temporal Graph Data for Traffic Forecasting