FPPformerV2: EMD-Based Short Input Long Sequence Time-Series Forecasting

This is the origin Pytorch implementation of FPPformerV2 in the following paper: [FPPformerV2: EMD-Based Short Input Long Sequence Time-Series Forecasting] (Manuscript submitted to IEEE TNNLS).

Model Architecture

The schematic in Figure 1 unveils the architecture of FPPformerV2. Compared with the former version, its encoder gets a novel attention mechanism. It is dubbed IEMD attention as it extracts the inter-relationships of different variables on the basis of EMD, which plays the role of a discriminator to determine whether the arbitrary variable pair owns underlying inter-relationship or not. IEMD attention is arranged at the end of each encoder stage, accompanied with a conventional feed-forward layer, to maintain the hierarchical architecture of the encoder and utilize the fully extracted sequence features of each variable provided by the preceding element-wise and patch-wise attention. The inter-relationships of different variables in IEMD attention are extracted in the patch level, rather than the entire sequence level, to economizes the computational cost. Besides, the decoder receives a hybrid of seasonal signals, whose periods are identified from the IMFs of input sequences, in lieu of a simple zero-initialized tensor. Instance normalization, which a prevailing technique proposed by T. Kim et al., is applied to it like the input of encoder to ensure the identical distribution of input and prediction sequence. IEMD attention is no longer deployed in decoder since the encoder has already extracted the inter-relationships of input sequences from all variables, whose existences are determined by the dominant periodic ingredients of each input sequence. Meanwhile, these dominant periodic ingredients also constitute the decoder input, making IEMD attention redundant in decoder.

As a whole, on the basis of EMD, the self-attentions in FPPformV2 encoder extract the parametric global input sequence features shared by all time-series sequences, as well as the dynamic cross-variable inter-relationships while FPPformV2 decoder receives the non-parametric local input sequence features, which vary with different input sequences. The global features and the local features interact with each other in the cross-attention modules of decoder, endowing with the property of global-local forecasting to FPPformerV2.

Figure 1. The architecture of FPPformerV2. Two improvements to the former version are highlighted in red.

Requirements

python == 3.11.4
numpy == 1.24.3
pandas == 1.5.3
scipy == 1.11.3
scikit_learn == 0.24.1
torch == 2.1.0+cu118
EMD-signal == 1.5.2

Dependencies can be installed using the following command:

pip install -r requirements.txt

Data

ETT, ECL, Traffic and Weather dataset were acquired at: here. Solar dataset was acquired at: Solar. The raw data of Air dataset was acquired at: Air. The raw data of River dataset was acquired at: River. The raw data of BTC dataset was acquired at: BTC. The raw data of ETH dataset was acquired at: ETH. The last four datasets(Air-ETH) shall be used after proper data preparation so that they have already been arranged in this repository. One can also use the preprocessing program provided (expounded in the later section) to preprocess the last four datasets if he/she is interested in the raw data of them.

Data Preparation

After you acquire raw data of all datasets, please separately place them in corresponding folders at ./FPPformerV2/data.

We place ETT in the folder ./ETT-data, ECL in the folder ./electricity and weather in the folder ./weather of here (the folder tree in the link is shown as below) into folder ./data and rename them from ./ETT-data,./electricity, ./traffic and ./weather to ./ETT, ./ECL, ./Traffic and./weather respectively. We rename the file of ECL/Traffic from electricity.csv/traffic.csv to ECL.csv/Traffic.csv and rename its last variable from OT/OT to original MT_321/Sensor_861 separately.

The folder tree in https://drive.google.com/drive/folders/1ZOYpTUa82_jCcxIdTmyr0LXQfvaM9vIy?usp=sharing:
|-autoformer
| |-ETT-data
| | |-ETTh1.csv
| | |-ETTh2.csv
| | |-ETTm1.csv
| | |-ETTm2.csv
| |
| |-electricity
| | |-electricity.csv
| |
| |-traffic
| | |-traffic.csv
| |
| |-weather
| | |-weather.csv

We place Solar in the folder ./financial of Solar (the folder tree in the link is shown as below) into the folder ./data and rename them as ./Solar respectively.

The folder tree in https://drive.google.com/drive/folders/1Gv1MXjLo5bLGep4bsqDyaNMI2oQC9GH2?usp=sharing:
|-dataset
| |-financial
| | |-solar_AL.txt

We place Air/River/BTC/ETH in Air /River /BTC /ETH (the folder tree in the link is shown as below) into the folder ./Air/./River/./BTC/./ETH respectively.

The folder tree in https://archive.ics.uci.edu/dataset/360/air+quality:
|-air+quality
| |-AirQualityUCI.csv
| |-AirQualityUCI.xlsx

The folder tree in https://www.kaggle.com/datasets/samanemami/river-flowrf2:
|-river-flowrf2
| |-RF2.csv

The folder tree in https://www.kaggle.com/datasets/prasoonkottarathil/btcinusd:
|-btcinusd
| |-BTC-Hourly.csv

The folder tree in https://www.kaggle.com/datasets/franoisgeorgesjulien/crypto:
|-crypto
| |-Binance_ETHUSDT_1h (1).csv

Then you can run ./data/preprocess.py to preprocess the raw data of Air, River, BTC and ETH datasets. Attention! If you directly use the preprocessed datasets provided in this repository, there is no need to run ./data/preprocess.py, otherwise errors would occur.

In 'preprocess.py', We replace the missing values, which are tagged with -200 value, by the average values of normal ones. We remove the variable NMHC(GT) in Air dataset in that all data of this variable in test subset is missing. In River dataset, we only select the first eight variables as others are corresponding time-lagged observationst. Moreover, We remove the discrete variables in BTC/ETH datasets.

After you successfully run ./data/preprocess.py, you will obtain folder tree:

|-data
| |-Air
| | |-Air.csv
| |
| |-BTC
| | |-BTC.csv
| |
| |-ECL
| | |-ECL.csv
| |
| |-ETH
| | |-ETH.csv
| |
| |-ETT
| | |-ETTh1.csv
| | |-ETTh2.csv
| | |-ETTm1.csv
| | |-ETTm2.csv
| |
| |-River
| | |-River.csv
| |
| |-Solar
| | |-solar_AL.txt
| |
| |-Traffic
| | |-Traffic.csv
| |
| |-weather
| | |-weather.csv

Baseline

We select eight up-to-date baselines, including three TSFT (ARM, iTransformer, Basisformer), two TSFM (TSMixer, FreTS), one TCN (ModernTCN), one RNN-based forecasting method (WITRAN) and one cutting-edge statistics-based forecasting method (OneShotSTL). Most of these baselines are relative latecomers to FPPformer and their state-of-the-art performances are competent in challenging or even surpassing it. Their source codes origins are given below:

Baseline	Source Code
ARM	https://openreview.net/forum?id=JWpwDdVbaM
iTransformer	https://github.com/thuml/iTransformer
Basisformer	https://github.com/nzl5116190/Basisformer
TSMixer	https://github.com/google-research/google-research/tree/master/tsmixer
FreTS	https://github.com/aikunyi/frets
ModernTCN	https://openreview.net/forum?id=vpJMJerXHU
WITRAN	https://github.com/Water2sea/WITRAN
OneShotSTL	https://github.com/xiao-he/oneshotstl

Moreover, the default experiment settings/parameters of aforementioned seven baselines are given below respectively:

Baselines	Settings/Parameters name	Descriptions	Default mechanisms/values
ARM	d_model	The number of hidden dimensions	64
	n_heads	The number of heads in multi-head attention mechanism	8
	e_layers	The number of encoder layers	2
	d_layers	The number of decoder layers	1
	preprocessing_method	The preprocessing method	AUEL
	conv_size	The size of kernels in conv layers	[49, 145, 385]
	conv_padding	The padding value	[24, 72, 192]
	ema_alpha	The trainable EMA parameter	0.9
iTransformer	d_model	The number of hidden dimensions	512
	d_ff	Dimension of fcn	512
	n_heads	The number of heads in multi-head attention mechanism	8
	e_layers	The number of encoder layers	3
Basisformer	N	The number of learnable basis	10
	block_nums	The number of blocks	2
	bottleneck	reduction of bottleneck	2
	map_bottleneck	reduction of mapping bottleneck	2
	n_heads	The number of heads in multi-head attention mechanism	16
	d_model	The number of hidden dimensions	100
TSMixer	n_block	The number of block for deep architecture	2
TSMixer	d_model	The hidden feature dimension	64
FreTS	embed_size	The number of embedding dimensions	128
	hidden_size	The number of hidden dimensions	256
	channel_independence	Whether channels are dependent	1
ModernTCN	d_model	The number of hidden dimensions	64
	ffn_ratio	The FFN ratio	8
	kernel	The kernel size	51
	patch_size	The patch size	8
	stride	The stride value	4
	e_layers	The number of ModernTCN blocks	3
WITRAN	d_model	The number of hidden dimensions	32
	e_layers	The number of encoder layers	8
	WITRAN_dec	The prediction module of WITRAN	Concat
	WITRAN_deal	WITRAN deal data type	None
	WITRAN_grid_cols	Numbers of data grid cols for WITRAN	24
OneShotSTL	lambda1	The hyper-parameter to control smoothness	1.0
	lambda2	The hyper-parameter to control smoothness	0.5
	lambda3	The hyper-parameter to control smoothness	1.0

Usage

Commands for training and testing FPPformer of all datasets during multivariate/univariate forecasting are in ./scripts/Main.sh/./scripts/Univariate_ECL.sh respectively.

More parameter information please refer to main.py.

We provide a complete command for training and testing FPPformerV2:

For multivariate forecasting:

python -u main.py --data <data> --features <features> --input_len <input_len> --pred_len <pred_len> --encoder_layer <encoder_layer> --patch_size <patch_size> --d_model <d_model> --learning_rate <learning_rate> --dropout <dropout> --batch_size <batch_size> --train_epochs <train_epochs> --patience <patience> --itr <itr> --train --Cross <Cross> --EMD <EMD>

For univariate forecasting:

python -u main.py --data <data> --features <features> --input_len <input_len> --pred_len <pred_len> --encoder_layer <encoder_layer> --patch_size <patch_size> --d_model <d_model> --learning_rate <learning_rate> --dropout <dropout> --batch_size <batch_size> --train_epochs <train_epochs> --patience <patience> --itr <itr> --train --target <target> --EMD <EMD>

Here we provide a more detailed and complete command description for training and testing the model:

Parameter name	Description of parameter
data	The dataset name
root_path	The root path of the data file
data_path	The data file name
features	The forecasting task. This can be set to `M`,`S` (M : multivariate forecasting, S : univariate forecasting
target	Target feature in `S` task
ori_target	Default target, determine the EMD result order
checkpoints	Location of model checkpoints
input_len	Input sequence length
pred_len	Prediction sequence length
enc_in	Input size
dec_out	Output size
d_model	Dimension of model
dropout	Dropout
encoder_layer	The number of encoder layers
patch_size	The size of each patch
Cross	Whether to use cross-variable attention
EMD	Whether to use EMD as the prediction initialization
itr	Experiments times
train_epochs	Train epochs of the second stage
batch_size	The batch size of training input data in the second stage
patience	Early stopping patience
learning_rate	Optimizer learning rate

Results

The experiment parameters of each data set are formated in the Main.sh and Univariate_ECL.sh files in the directory ./scripts/. You can refer to these parameters for experiments, and you can also adjust the parameters to obtain better MSE and MAE results or draw better prediction figures.

We provide the results of EMD process in the link EMD. You can download and place it in corresponding folders at ./FPPformerV2/EMD to reduce the time consumption.

Figure 2. Multivariate forecasting results under 1-hour-level datasets

Figure 3. Multivariate forecasting results under minute-level datasets

Figure 4. Univariate forecasting results

Contact

If you have any questions, feel free to contact Li Shen through Email (shenli@buaa.edu.cn) or Github issues. Pull requests are highly welcomed!

OrigamiSL/FPPformerV2