thuml/iTransformer

能否提供一下PEMS所有数据集的96步长预测结果

Closed this issue · 7 comments

image
image
在4090单卡环境进行训练,完全使用脚本中相同的训练参数,但是在PEMS08数据集上并没有取得论文中的结果,相差甚远,无论是否修改use_norm
image
并且不只是PEMS08数据集,其他PEMS数据集同样存在这样的问题,短步长预测大致符合,48,96步长差距过大。

我也遇到类似的问题,不过似乎只出现在PEMS07跟PEMS08两个数据集上,并且我的input_len比原文所使用的更长。
35efbb41bbd6159de02c8b28082a903
微信图片_20240723113839

是的,
1721706312948
我在论文的模型基础上改进,发现大部分的数据集步长都是可以超越的,部分和源码中采用相同的不使用norm方法训练也超越了论文中的结果,唯独08数据集的48和96步长效果比较差,复现之后发现实际也跑不出相同的结果,所以想问一下作者。
@FrankHo-Hwc

是的, 1721706312948 我在论文的模型基础上改进,发现大部分的数据集步长都是可以超越的,部分和源码中采用相同的不使用norm方法训练也超越了论文中的结果,唯独08数据集的48和96步长效果比较差,复现之后发现实际也跑不出相同的结果,所以想问一下作者。 @FrankHo-Hwc

是的,我看前面的issue说调节学习率跟use_norm项可以让结果好一些。在96那个步长虽然结果有好转,但是还是不如论文的结果,所以还是得作者回应一下。

@bigdata0 what settings did you use to get the results reported in their paper for each of the PEMS dataset? I ran their script and adjusted use_norm, but still found the values to be quite different even on PEMS03, and PEMS04.

@JerayuT

  1. Check if your itransformer.py is from the Time series Library or this repository. In the version integrated into the Time series Library, use_norm is used by default and cannot be modified through the command line. However, in this version of the repository, use_norm can be modified.
  2. In the same dataset, for a certain step size prediction, you can try setting use_norm to 0, while for another step size prediction, do not set use_norm to 0. For example, when using the first 96 steps to predict 12 steps, do not set use_norm to 0, while predicting 96 steps, set it.
    I can reproduce most of the results in the paper through the above methods, but there are still some data that cannot be reproduced.

@bigdata0 I'm currently using this version of the repository. I was also wondering if you adjusted the learning rate or anything else. When I ran the PEMS03 dataset and adjusted the use_norm to 0 for step size prediction {12,24,48,96}. I got improved results from my previous runs, but for step size 48 and 96, the values are still quite far from the results in the paper.

@JerayuT Okay, you can refer to my configuration parameters.
python -u run.py --is_training 1 --root_path ./dataset/PEMS/ --data_path PEMS03.npz --model_id PEMS03_96_96 --model iTransformer --data PEMS --features M --seq_len 96 --pred_len 96 --e_layers 4 --enc_in 358 --dec_in 358 --c_out 358 --des 'Exp' --d_model 512 --d_ff 512 --lea
rning_rate 0.001 --itr 1 --use_norm 0
Args in experiment:
Namespace(is_training=1, model_id='PEMS03_96_96', model='iTransformer', data='PEMS', root_path='./dataset/PEMS/', data_path='PEMS03.npz', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=96, enc_in=358, dec_in=358, c_out=358, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=0, partial_start_index=0)
Use GPU: cuda:0

start training : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 15533
val 5051
test 5051
iters: 100, epoch: 1 | loss: 0.2663582
speed: 0.0661s/iter; left time: 313.9474s
iters: 200, epoch: 1 | loss: 0.2135554
speed: 0.0608s/iter; left time: 282.5925s
iters: 300, epoch: 1 | loss: 0.2013143
speed: 0.0572s/iter; left time: 260.3726s
iters: 400, epoch: 1 | loss: 0.1978480
speed: 0.0608s/iter; left time: 270.5316s
Epoch: 1 cost time: 30.010493993759155
Epoch: 1, Steps: 485 | Train Loss: 0.2256930 Vali Loss: 0.1850095 Test Loss: 0.2440917
Validation loss decreased (inf --> 0.185009). Saving model ...
Updating learning rate to 0.001
iters: 100, epoch: 2 | loss: 0.1753062
speed: 1.6842s/iter; left time: 7184.8671s
iters: 200, epoch: 2 | loss: 0.1824031
speed: 0.0648s/iter; left time: 269.8704s
iters: 300, epoch: 2 | loss: 0.1627843
speed: 0.0622s/iter; left time: 253.0022s
iters: 400, epoch: 2 | loss: 0.1522650
speed: 0.0618s/iter; left time: 244.9026s
Epoch: 2 cost time: 31.733307361602783
Epoch: 2, Steps: 485 | Train Loss: 0.1630596 Vali Loss: 0.1565045 Test Loss: 0.2239741
Validation loss decreased (0.185009 --> 0.156505). Saving model ...
Updating learning rate to 0.0005
iters: 100, epoch: 3 | loss: 0.1193284
speed: 1.7949s/iter; left time: 6786.6307s
iters: 200, epoch: 3 | loss: 0.1366769
speed: 0.0643s/iter; left time: 236.7496s
iters: 300, epoch: 3 | loss: 0.1275335
speed: 0.0637s/iter; left time: 228.0470s
iters: 400, epoch: 3 | loss: 0.1218285
speed: 0.0650s/iter; left time: 226.2647s
Epoch: 3 cost time: 32.51231002807617
Epoch: 3, Steps: 485 | Train Loss: 0.1272281 Vali Loss: 0.1349931 Test Loss: 0.2023265
Validation loss decreased (0.156505 --> 0.134993). Saving model ...
Updating learning rate to 0.00025
iters: 100, epoch: 4 | loss: 0.1189363
speed: 1.8123s/iter; left time: 5973.2009s
iters: 200, epoch: 4 | loss: 0.1217045
speed: 0.0607s/iter; left time: 194.1133s
iters: 300, epoch: 4 | loss: 0.1013873
speed: 0.0603s/iter; left time: 186.7840s
iters: 400, epoch: 4 | loss: 0.1246471
speed: 0.0626s/iter; left time: 187.6594s
Epoch: 4 cost time: 30.79427933692932
Epoch: 4, Steps: 485 | Train Loss: 0.1147907 Vali Loss: 0.1244477 Test Loss: 0.1876199
Validation loss decreased (0.134993 --> 0.124448). Saving model ...
Updating learning rate to 0.000125
iters: 100, epoch: 5 | loss: 0.1103279
speed: 1.8241s/iter; left time: 5127.6685s
iters: 200, epoch: 5 | loss: 0.1176323
speed: 0.0640s/iter; left time: 173.4984s
iters: 300, epoch: 5 | loss: 0.1146321
speed: 0.0663s/iter; left time: 173.0208s
iters: 400, epoch: 5 | loss: 0.1095776
speed: 0.0660s/iter; left time: 165.7033s
Epoch: 5 cost time: 32.98799157142639
Epoch: 5, Steps: 485 | Train Loss: 0.1096915 Vali Loss: 0.1192572 Test Loss: 0.1804807
Validation loss decreased (0.124448 --> 0.119257). Saving model ...
Updating learning rate to 6.25e-05
iters: 100, epoch: 6 | loss: 0.1013640
speed: 1.7467s/iter; left time: 4062.9149s
iters: 200, epoch: 6 | loss: 0.1459941
speed: 0.0680s/iter; left time: 151.3634s
iters: 300, epoch: 6 | loss: 0.0991267
speed: 0.0720s/iter; left time: 153.0836s
iters: 400, epoch: 6 | loss: 0.1169957
speed: 0.0729s/iter; left time: 147.6160s
Epoch: 6 cost time: 35.339478731155396
Epoch: 6, Steps: 485 | Train Loss: 0.1069848 Vali Loss: 0.1189177 Test Loss: 0.1792915
Validation loss decreased (0.119257 --> 0.118918). Saving model ...
Updating learning rate to 3.125e-05
iters: 100, epoch: 7 | loss: 0.0993732
speed: 1.7634s/iter; left time: 3246.3705s
iters: 200, epoch: 7 | loss: 0.1180835
speed: 0.0668s/iter; left time: 116.2247s
iters: 300, epoch: 7 | loss: 0.0905823
speed: 0.0642s/iter; left time: 105.3868s
iters: 400, epoch: 7 | loss: 0.1043306
speed: 0.0647s/iter; left time: 99.7038s
Epoch: 7 cost time: 32.913817405700684
Epoch: 7, Steps: 485 | Train Loss: 0.1055459 Vali Loss: 0.1175509 Test Loss: 0.1766107
Validation loss decreased (0.118918 --> 0.117551). Saving model ...
Updating learning rate to 1.5625e-05
iters: 100, epoch: 8 | loss: 0.1248059
speed: 1.7733s/iter; left time: 2404.6006s
iters: 200, epoch: 8 | loss: 0.1046225
speed: 0.0637s/iter; left time: 79.9484s
iters: 300, epoch: 8 | loss: 0.1021659
speed: 0.0630s/iter; left time: 72.8100s
iters: 400, epoch: 8 | loss: 0.1225355
speed: 0.0610s/iter; left time: 64.4164s
Epoch: 8 cost time: 31.724972248077393
Epoch: 8, Steps: 485 | Train Loss: 0.1047085 Vali Loss: 0.1169733 Test Loss: 0.1760689
Validation loss decreased (0.117551 --> 0.116973). Saving model ...
Updating learning rate to 7.8125e-06
iters: 100, epoch: 9 | loss: 0.1084197
speed: 1.7494s/iter; left time: 1523.6881s
iters: 200, epoch: 9 | loss: 0.1012328
speed: 0.0658s/iter; left time: 50.7213s
iters: 300, epoch: 9 | loss: 0.1137820
speed: 0.0622s/iter; left time: 41.7486s
iters: 400, epoch: 9 | loss: 0.1025800
speed: 0.0630s/iter; left time: 35.9713s
Epoch: 9 cost time: 32.4756760597229
Epoch: 9, Steps: 485 | Train Loss: 0.1042602 Vali Loss: 0.1166037 Test Loss: 0.1761539
Validation loss decreased (0.116973 --> 0.116604). Saving model ...
Updating learning rate to 3.90625e-06
iters: 100, epoch: 10 | loss: 0.1130980
speed: 1.7666s/iter; left time: 681.9241s
iters: 200, epoch: 10 | loss: 0.1034796
speed: 0.0580s/iter; left time: 16.5877s
iters: 300, epoch: 10 | loss: 0.0804347
speed: 0.0600s/iter; left time: 11.1661s
iters: 400, epoch: 10 | loss: 0.0938171
speed: 0.0621s/iter; left time: 5.3409s
Epoch: 10 cost time: 30.659968376159668
Epoch: 10, Steps: 485 | Train Loss: 0.1040058 Vali Loss: 0.1168290 Test Loss: 0.1764464
EarlyStopping counter: 1 out of 3
Updating learning rate to 1.953125e-06
testing : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 5051
test shape: (5051, 1, 96, 358) (5051, 1, 96, 358)
test shape: (5051, 96, 358) (5051, 96, 358)
mse:0.17615370452404022, mae:0.2849079966545105