thuml/iTransformer

got killed

Closed this issue · 1 comments

bash ./scripts/multivariate_forecasting/Traffic/iTransformer.sh
Args in experiment:
Namespace(is_training=1, model_id='traffic_96_96', model='iTransformer', data='custom', root_path='./dataset/traffic/', data_path='traffic.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=96, enc_in=862, dec_in=862, c_out=862, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=16, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=True, partial_start_index=0)
Use CPU

start training : traffic_96_96_iTransformer_custom_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 12089
val 1661
test 3413
./scripts/multivariate_forecasting/Traffic/iTransformer.sh: line 24: 633 Killed python -u run.py --is_training 1 --root_path ./dataset/traffic/ --data_path traffic.csv --model_id traffic_96_96 --model $model_name --data custom --features M --seq_len 96 --pred_len 96 --e_layers 4 --enc_in 862 --dec_in 862 --c_out 862 --des 'Exp' --d_model 512 --d_ff 512 --batch_size 16 --learning_rate 0.001 --itr 1
Args in experiment:
Namespace(is_training=1, model_id='traffic_96_192', model='iTransformer', data='custom', root_path='./dataset/traffic/', data_path='traffic.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=192, enc_in=862, dec_in=862, c_out=862, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=16, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=True, partial_start_index=0)
Use CPU
start training : traffic_96_192_iTransformer_custom_M_ft96_sl48_ll192_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 11993
val 1565
test 3317
./scripts/multivariate_forecasting/Traffic/iTransformer.sh: line 45: 916 Killed python -u run.py --is_training 1 --root_path ./dataset/traffic/ --data_path traffic.csv --model_id traffic_96_192 --model $model_name --data custom --features M --seq_len 96 --pred_len 192 --e_layers 4 --enc_in 862 --dec_in 862 --c_out 862 --des 'Exp' --d_model 512 --d_ff 512 --batch_size 16 --learning_rate 0.001 --itr 1
Args in experiment:
Namespace(is_training=1, model_id='traffic_96_336', model='iTransformer', data='custom', root_path='./dataset/traffic/', data_path='traffic.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=336, enc_in=862, dec_in=862, c_out=862, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=16, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=True, partial_start_index=0)
Use CPU
start training : traffic_96_336_iTransformer_custom_M_ft96_sl48_ll336_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 11849
val 1421
test 3173
./scripts/multivariate_forecasting/Traffic/iTransformer.sh: line 66: 1241 Killed python -u run.py --is_training 1 --root_path ./dataset/traffic/ --data_path traffic.csv --model_id traffic_96_336 --model $model_name --data custom --features M --seq_len 96 --pred_len 336 --e_layers 4 --enc_in 862 --dec_in 862 --c_out 862 --des 'Exp' --d_model 512 --d_ff 512 --batch_size 16 --learning_rate 0.001 --itr 1
Args in experiment:
Namespace(is_training=1, model_id='traffic_96_720', model='iTransformer', data='custom', root_path='./dataset/traffic/', data_path='traffic.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=720, enc_in=862, dec_in=862, c_out=862, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=16, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=True, partial_start_index=0)
Use CPU
start training : traffic_96_720_iTransformer_custom_M_ft96_sl48_ll720_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 11465
val 1037
test 2789
./scripts/multivariate_forecasting/Traffic/iTransformer.sh: line 87: 1538 Killed python -u run.py --is_training 1 --root_path ./dataset/traffic/ --data_path traffic.csv --model_id traffic_96_720 --model $model_name --data custom --features M --seq_len 96 --pred_len 720 --e_layers 4 --enc_in 862 --dec_in 862 --c_out 862 --des 'Exp' --d_model 512 --d_ff 512 --batch_size 16 --learning_rate 0.001 --itr 1

It might be caused by an OOM problem. You can monitor the occupancy of the device and reduce the batch size accordingly.