GestaltCogTeam/BasicTS

[🐞] 关于使用课程学习时,测试集上Metric指标完全一致的异常现象

Closed this issue · 6 comments

Is there an existing issue / discussion for this? | 是否已有关于该错误的issue或讨论?

  • I have searched the existing issues / discussions | 我已经搜索过已有的issues和讨论

Is there an existing answer for this in tutorial? | 该问题是否在教程中有解答?

  • I have searched tutorial | 我已经搜索过tutorial

Current Behavior | 当前行为

  1. 你好,我在METRLA上复现D2STGNN时,发现测试集上的loss在训练前的很多个epoch上各种指标的数值完全一致;
  2. 虽然训练过程中使用了课程学习,一开始只有第一步的预测用来计算梯度,但是模型的参数仍在更新,每个epoch的预测值在数值上应该是有差异的,指标的数值也不可能做到完全一致吧?但是log中的Test Metric就是完全一致,(观察下面的log,前12个epoch(这里只贴出了前几个)的Test MAE, Test MAPE, Test RMSE都是完全一致的)
截屏2024-11-30 10 38 56

Expected Behavior | 期望行为

No response

Environment | 运行环境

- OS:Ubuntu 22.04.2
- DEVICE: Tesla V100 
- NVIDIA Driver:
- CUDA: 11.4
- NVIDIA GPU Memory: 32GB
- PyTorch: 1.10.0

BasicTS logs | BasicTS日志

2024-11-27 04:06:24,681 - easytorch-training - INFO - Initializing training.
2024-11-27 04:06:24,681 - easytorch-training - INFO - Set clip grad, param: {'max_norm': 5.0}
2024-11-27 04:06:24,681 - easytorch-training - INFO - Building training data loader.
2024-11-27 04:06:24,725 - easytorch-training - INFO - Train dataset length: 23968
2024-11-27 04:06:24,727 - easytorch-training - INFO - Set optim: Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.002
weight_decay: 1e-05
)
2024-11-27 04:06:24,727 - easytorch-training - INFO - Set lr_scheduler: <torch.optim.lr_scheduler.MultiStepLR object at 0x7f6729e66610>
2024-11-27 04:06:24,730 - easytorch-training - INFO - Initializing validation.
2024-11-27 04:06:24,730 - easytorch-training - INFO - Building val data loader.
2024-11-27 04:06:24,734 - easytorch-training - INFO - Validation dataset length: 3404
2024-11-27 04:06:24,739 - easytorch-training - INFO - Test dataset length: 6831
2024-11-27 04:06:24,740 - easytorch-training - INFO - Number of parameters: 391962
2024-11-27 04:06:24,740 - easytorch-training - INFO - Epoch 1 / 100
2024-11-27 04:11:06,531 - easytorch-training - INFO - Result : [train/time: 281.79 (s), train/lr: 2.00e-03, train/loss: 2.4959, train/MAE: 2.4959, train/MAPE: 0.0596, train/RMSE: 4.2734]
2024-11-27 04:11:06,532 - easytorch-training - INFO - Start validation.
2024-11-27 04:11:11,930 - easytorch-training - INFO - Result : [val/time: 5.40 (s), val/loss: 9.1437, val/MAE: 9.1437, val/MAPE: 0.2509, val/RMSE: 11.8495]
2024-11-27 04:11:11,969 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:11:22,620 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:11:22,623 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:11:22,626 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:11:22,667 - easytorch-training - INFO - Result : [test/time: 10.70 (s), test/loss: 9.2586, test/MAE: 9.5163, test/MAPE: 0.2739, test/RMSE: 12.9837]
2024-11-27 04:11:22,705 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_001.pt saved
2024-11-27 04:11:22,706 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:23:01
2024-11-27 04:11:22,706 - easytorch-training - INFO - Epoch 2 / 100
2024-11-27 04:15:57,777 - easytorch-training - INFO - Result : [train/time: 275.07 (s), train/lr: 1.00e-03, train/loss: 2.1930, train/MAE: 2.1930, train/MAPE: 0.0505, train/RMSE: 3.7551]
2024-11-27 04:15:57,778 - easytorch-training - INFO - Start validation.
2024-11-27 04:16:03,192 - easytorch-training - INFO - Result : [val/time: 5.41 (s), val/loss: 8.8908, val/MAE: 8.8908, val/MAPE: 0.2427, val/RMSE: 11.6302]
2024-11-27 04:16:03,327 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:16:13,861 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:16:13,864 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:16:13,868 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:16:13,914 - easytorch-training - INFO - Result : [test/time: 10.59 (s), test/loss: 9.0002, test/MAE: 9.2428, test/MAPE: 0.2647, test/RMSE: 12.6893]
2024-11-27 04:16:13,951 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_002.pt saved
2024-11-27 04:16:13,951 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:17:25
2024-11-27 04:16:13,951 - easytorch-training - INFO - Epoch 3 / 100
2024-11-27 04:20:59,040 - easytorch-training - INFO - Result : [train/time: 285.09 (s), train/lr: 1.00e-03, train/loss: 2.1270, train/MAE: 2.1270, train/MAPE: 0.0489, train/RMSE: 3.6404]
2024-11-27 04:20:59,041 - easytorch-training - INFO - Start validation.
2024-11-27 04:21:04,460 - easytorch-training - INFO - Result : [val/time: 5.42 (s), val/loss: 8.8302, val/MAE: 8.8302, val/MAPE: 0.2389, val/RMSE: 11.5554]
2024-11-27 04:21:04,600 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:21:15,206 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:21:15,210 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:21:15,213 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:21:15,260 - easytorch-training - INFO - Result : [test/time: 10.66 (s), test/loss: 8.9314, test/MAE: 9.1684, test/MAPE: 0.2602, test/RMSE: 12.5837]
2024-11-27 04:21:15,295 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_003.pt saved
2024-11-27 04:21:15,295 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:21:09
2024-11-27 04:21:15,295 - easytorch-training - INFO - Epoch 4 / 100
2024-11-27 04:26:02,339 - easytorch-training - INFO - Result : [train/time: 287.04 (s), train/lr: 1.00e-03, train/loss: 2.0948, train/MAE: 2.0948, train/MAPE: 0.0480, train/RMSE: 3.5915]
2024-11-27 04:26:02,341 - easytorch-training - INFO - Start validation.
2024-11-27 04:26:07,764 - easytorch-training - INFO - Result : [val/time: 5.42 (s), val/loss: 8.8518, val/MAE: 8.8518, val/MAPE: 0.2390, val/RMSE: 11.5371]
2024-11-27 04:26:18,365 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:26:18,369 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:26:18,372 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:26:18,410 - easytorch-training - INFO - Result : [test/time: 10.64 (s), test/loss: 8.9607, test/MAE: 9.1991, test/MAPE: 0.2605, test/RMSE: 12.5666]
2024-11-27 04:26:18,457 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_004.pt saved
2024-11-27 04:26:18,458 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:23:47
2024-11-27 04:26:18,458 - easytorch-training - INFO - Epoch 5 / 100
2024-11-27 04:30:58,033 - easytorch-training - INFO - Result : [train/time: 279.57 (s), train/lr: 1.00e-03, train/loss: 2.0766, train/MAE: 2.0766, train/MAPE: 0.0475, train/RMSE: 3.5595]
2024-11-27 04:30:58,034 - easytorch-training - INFO - Start validation.
2024-11-27 04:31:03,418 - easytorch-training - INFO - Result : [val/time: 5.38 (s), val/loss: 8.8210, val/MAE: 8.8210, val/MAPE: 0.2387, val/RMSE: 11.5208]
2024-11-27 04:31:03,460 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:31:14,041 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:31:14,044 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:31:14,047 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:31:14,087 - easytorch-training - INFO - Result : [test/time: 10.63 (s), test/loss: 8.9170, test/MAE: 9.1517, test/MAPE: 0.2599, test/RMSE: 12.5297]
2024-11-27 04:31:14,123 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_005.pt saved
2024-11-27 04:31:14,124 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:22:52
2024-11-27 04:31:14,124 - easytorch-training - INFO - Epoch 6 / 100
2024-11-27 04:35:59,957 - easytorch-training - INFO - Result : [train/time: 285.83 (s), train/lr: 1.00e-03, train/loss: 2.0606, train/MAE: 2.0606, train/MAPE: 0.0471, train/RMSE: 3.5332]
2024-11-27 04:35:59,958 - easytorch-training - INFO - Start validation.
2024-11-27 04:36:05,357 - easytorch-training - INFO - Result : [val/time: 5.40 (s), val/loss: 8.7914, val/MAE: 8.7914, val/MAPE: 0.2368, val/RMSE: 11.4699]
2024-11-27 04:36:05,395 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:36:15,956 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:36:15,959 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:36:15,962 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:36:16,001 - easytorch-training - INFO - Result : [test/time: 10.60 (s), test/loss: 8.8976, test/MAE: 9.1173, test/MAPE: 0.2578, test/RMSE: 12.4579]
2024-11-27 04:36:16,038 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_006.pt saved
2024-11-27 04:36:16,038 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:23:59
2024-11-27 04:36:16,038 - easytorch-training - INFO - Epoch 7 / 100
2024-11-27 04:40:56,333 - easytorch-training - INFO - Result : [train/time: 280.30 (s), train/lr: 1.00e-03, train/loss: 2.2465, train/MAE: 2.2465, train/MAPE: 0.0528, train/RMSE: 3.9972]
2024-11-27 04:40:56,334 - easytorch-training - INFO - Start validation.
2024-11-27 04:41:01,788 - easytorch-training - INFO - Result : [val/time: 5.45 (s), val/loss: 6.4913, val/MAE: 6.4913, val/MAPE: 0.1738, val/RMSE: 9.4332]
2024-11-27 04:41:01,825 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:41:12,461 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:41:12,464 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:41:12,468 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:41:12,513 - easytorch-training - INFO - Result : [test/time: 10.69 (s), test/loss: 6.7522, test/MAE: 6.8303, test/MAPE: 0.1910, test/RMSE: 10.3509]
2024-11-27 04:41:12,549 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_007.pt saved
2024-11-27 04:41:12,549 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:23:30
2024-11-27 04:41:12,549 - easytorch-training - INFO - Epoch 8 / 100
2024-11-27 04:45:57,793 - easytorch-training - INFO - Result : [train/time: 285.24 (s), train/lr: 1.00e-03, train/loss: 2.2061, train/MAE: 2.2061, train/MAPE: 0.0517, train/RMSE: 3.9258]
2024-11-27 04:45:57,795 - easytorch-training - INFO - Start validation.
2024-11-27 04:46:03,281 - easytorch-training - INFO - Result : [val/time: 5.49 (s), val/loss: 6.4468, val/MAE: 6.4468, val/MAPE: 0.1865, val/RMSE: 9.6763]
2024-11-27 04:46:03,321 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:46:14,082 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:46:14,086 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:46:14,090 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:46:14,129 - easytorch-training - INFO - Result : [test/time: 10.81 (s), test/loss: 6.7420, test/MAE: 6.9054, test/MAPE: 0.2083, test/RMSE: 10.9166]
2024-11-27 04:46:14,168 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_008.pt saved
2024-11-27 04:46:14,168 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:24:12
2024-11-27 04:46:14,168 - easytorch-training - INFO - Epoch 9 / 100
2024-11-27 04:50:57,949 - easytorch-training - INFO - Result : [train/time: 283.78 (s), train/lr: 1.00e-03, train/loss: 2.1945, train/MAE: 2.1945, train/MAPE: 0.0513, train/RMSE: 3.9039]
2024-11-27 04:50:57,951 - easytorch-training - INFO - Start validation.
2024-11-27 04:51:03,308 - easytorch-training - INFO - Result : [val/time: 5.36 (s), val/loss: 6.4208, val/MAE: 6.4208, val/MAPE: 0.1756, val/RMSE: 9.3953]
2024-11-27 04:51:03,345 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:51:13,853 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:51:13,857 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:51:13,860 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:51:13,904 - easytorch-training - INFO - Result : [test/time: 10.56 (s), test/loss: 6.6027, test/MAE: 6.7432, test/MAPE: 0.1934, test/RMSE: 10.3392]
2024-11-27 04:51:13,944 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_009.pt saved
2024-11-27 04:51:13,944 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:24:24
2024-11-27 04:51:13,944 - easytorch-training - INFO - Epoch 10 / 100
2024-11-27 04:55:56,927 - easytorch-training - INFO - Result : [train/time: 282.98 (s), train/lr: 1.00e-03, train/loss: 2.1832, train/MAE: 2.1832, train/MAPE: 0.0510, train/RMSE: 3.8829]
2024-11-27 04:55:56,929 - easytorch-training - INFO - Start validation.
2024-11-27 04:56:02,352 - easytorch-training - INFO - Result : [val/time: 5.42 (s), val/loss: 6.3411, val/MAE: 6.3411, val/MAPE: 0.1707, val/RMSE: 9.3277]
2024-11-27 04:56:02,389 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 04:56:12,902 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 04:56:12,905 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 04:56:12,909 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 04:56:12,953 - easytorch-training - INFO - Result : [test/time: 10.56 (s), test/loss: 6.5473, test/MAE: 6.6797, test/MAPE: 0.1888, test/RMSE: 10.2628]
2024-11-27 04:56:12,987 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_010.pt saved
2024-11-27 04:56:12,994 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:24:27
2024-11-27 04:56:12,994 - easytorch-training - INFO - Epoch 11 / 100
2024-11-27 05:00:56,126 - easytorch-training - INFO - Result : [train/time: 283.13 (s), train/lr: 1.00e-03, train/loss: 2.1755, train/MAE: 2.1755, train/MAPE: 0.0507, train/RMSE: 3.8688]
2024-11-27 05:00:56,127 - easytorch-training - INFO - Start validation.
2024-11-27 05:01:01,507 - easytorch-training - INFO - Result : [val/time: 5.38 (s), val/loss: 6.4857, val/MAE: 6.4857, val/MAPE: 0.1814, val/RMSE: 9.5756]
2024-11-27 05:01:12,048 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 05:01:12,051 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 05:01:12,055 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 05:01:12,098 - easytorch-training - INFO - Result : [test/time: 10.59 (s), test/loss: 6.6914, test/MAE: 6.8530, test/MAPE: 0.2013, test/RMSE: 10.6442]
2024-11-27 05:01:12,133 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_011.pt saved
2024-11-27 05:01:12,134 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:24:30
2024-11-27 05:01:12,134 - easytorch-training - INFO - Epoch 12 / 100
2024-11-27 05:05:56,935 - easytorch-training - INFO - Result : [train/time: 284.80 (s), train/lr: 1.00e-03, train/loss: 2.1657, train/MAE: 2.1657, train/MAPE: 0.0504, train/RMSE: 3.8465]
2024-11-27 05:05:56,936 - easytorch-training - INFO - Start validation.
2024-11-27 05:06:02,279 - easytorch-training - INFO - Result : [val/time: 5.34 (s), val/loss: 6.5621, val/MAE: 6.5621, val/MAPE: 0.1758, val/RMSE: 9.6406]
2024-11-27 05:06:12,756 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 11.5078, Test MAPE: 0.3140, Test RMSE: 14.0593
2024-11-27 05:06:12,759 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 11.5074, Test MAPE: 0.3140, Test RMSE: 14.0590
2024-11-27 05:06:12,763 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 11.5058, Test MAPE: 0.3139, Test RMSE: 14.0566
2024-11-27 05:06:12,809 - easytorch-training - INFO - Result : [test/time: 10.53 (s), test/loss: 6.6839, test/MAE: 6.8372, test/MAPE: 0.1928, test/RMSE: 10.5252]
2024-11-27 05:06:12,844 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_012.pt saved
2024-11-27 05:06:12,845 - easytorch-training - INFO - The estimated training finish time is 2024-11-27 12:24:45
2024-11-27 05:06:12,845 - easytorch-training - INFO - Epoch 13 / 100
2024-11-27 05:10:50,587 - easytorch-training - INFO - Result : [train/time: 277.74 (s), train/lr: 1.00e-03, train/loss: 2.2941, train/MAE: 2.2941, train/MAPE: 0.0548, train/RMSE: 4.1819]
2024-11-27 05:10:50,589 - easytorch-training - INFO - Start validation.
2024-11-27 05:10:56,080 - easytorch-training - INFO - Result : [val/time: 5.49 (s), val/loss: 4.0653, val/MAE: 4.0653, val/MAPE: 0.1185, val/RMSE: 6.9830]
2024-11-27 05:10:56,122 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_best_val_MAE.pt saved
2024-11-27 05:11:06,861 - easytorch-training - INFO - Evaluate best model on test data for horizon 3, Test MAE: 2.6027, Test MAPE: 0.0655, Test RMSE: 4.9406
2024-11-27 05:11:06,865 - easytorch-training - INFO - Evaluate best model on test data for horizon 6, Test MAE: 4.8427, Test MAPE: 0.1378, Test RMSE: 7.9470
2024-11-27 05:11:06,869 - easytorch-training - INFO - Evaluate best model on test data for horizon 12, Test MAE: 6.9599, Test MAPE: 0.2262, Test RMSE: 11.8305
2024-11-27 05:11:06,915 - easytorch-training - INFO - Result : [test/time: 10.79 (s), test/loss: 4.3854, test/MAE: 4.4617, test/MAPE: 0.1354, test/RMSE: 8.2078]
2024-11-27 05:11:06,954 - easytorch-training - INFO - Checkpoint checkpoints/D2STGNN/METR-LA_100_12_12/82f08270e9e07d9a273be3fdebc27246/D2STGNN_013.pt saved

Steps To Reproduce | 复现方法

python experiments/train.py -c baselines/D2STGNN/METRLA.py --g 0

Anything else? | 备注

training_log_20241127040624.log

课程学习会逐步优化每一个horizon,这些指标没有变化是因为还没有优化到它们。继续跑下去他们就有变化了。

你好,我的疑问在于,即便还没有优化到对应的horizon,但模型参数仍然是在更新的,每次的预测值也不太可能完全不变吧?指标的数值应该会有一些差异?

绝大部分预测模型使用一个全连接层作为回归器产生预测,其大小是DxF,D是hidden dimension,F是预测步长。只有优化到对应的步长的斥候,这个矩阵对应的参数列才会更新,在此之前预测结果等价于随机。
如果是自回归模型,例如Seq2Seq类的模型,则会如您所说。

  1. 您好,感谢您的回复,不过其实我还是存在一点疑惑,预测层的参数没有更新,但是预测层之前的模块仍是在更新的;
  2. 此外,在优化到对应horizon,预测结果的确可能是随机的,不过相应的指标我认为也可能是随机波动的,比如从11.50 波动到 11.57这样?但是log中展示的却是小数点后4位数值完全没有变化,这个挺让我好奇的,想和您讨论一下。
  3. 我之前有想过是否是线性预测层的权重初始化全部为0,因此预测层前的输出完全无关,但是似乎nn.Linear()的默认初始化似乎并不是这样的?
linear = nn.Linear(in_features=10, out_features=5)

nn.init.constant_(linear.weight, 0)
nn.init.constant_(linear.bias, 0)

您的问题很有趣,您可以先通过上述代码修改线性层的默认初始化,也可以使用pickle等工具将hidden states和linear的参数保留,从而进行更详细的实验。我们有空的时候也会做一些实验来探寻根本原因。

您好,已经解决了,确实是最后预测层的weight初始化为了一些非常小的值 (1e-12量级),因此最后输出也是很小的数,所以指标确实小数点后5位看不出差异