Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx
AlphaNext opened this issue · 0 comments
AlphaNext commented
start cmd
imagenetpath=mypath
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 moby_main.py \
--cfg configs/moby_swin_tiny.yaml --data-path ${imagenetpath} --batch-size 256
but get the Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxx
error
^[[32m[2023-10-24 17:33:21 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][290/625] eta 0:05:52 lr 0.002772 time 0.5567 (1.0516) loss 10.5960 (10.9174) grad_norm 1.4802 (1.5236) mem 45716MB^[[32m[2023-10-24 17:33:38 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][300/625] eta 0:05:47 lr 0.002785 time 0.7607 (1.0707) loss 10.7823 (10.9141) grad_norm 2.3465 (1.5536) mem 45716MB^[[32m[2023-10-24 17:33:45 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][310/625] eta 0:05:33 lr 0.002797 time 0.9247 (1.0588) loss 10.9386 (10.9140) grad_norm 3.8597 (1.6136) mem 45716MBGradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 65536.0
^[[32m[2023-10-24 17:33:53 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][320/625] eta 0:
05:20 lr 0.002810 time 0.5590 (1.0518) loss 11.4219 (10.9264) grad_norm 3.9233 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:00 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][330/625] eta 0:
05:07 lr 0.002823 time 0.5751 (1.0412) loss 11.6204 (10.9487) grad_norm 2.7699 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:09 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][340/625] eta 0:
04:55 lr 0.002836 time 0.5561 (1.0365) loss 11.2880 (10.9609) grad_norm 2.3273 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:16 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][350/625] eta 0:
04:42 lr 0.002849 time 0.5530 (1.0271) loss 11.0601 (10.9651) grad_norm 0.9230 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:23 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][360/625] eta 0:
04:30 lr 0.002861 time 0.5628 (1.0200) loss 10.9609 (10.9669) grad_norm 0.8707 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:30 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][370/625] eta 0:
04:17 lr 0.002874 time 0.5648 (1.0094) loss 10.9728 (10.9655) grad_norm 1.9388 (inf) mem 45716MB
^[[32m[2023-10-24 17:34:36 moby__swin_tiny__patch4_window7_224__odpr02_tdpr0_cm099_ct02_queue4096_proj2_pred2]^[[0m^[[33m(moby_main.py 177)^[[0m: INFO Train: [3/300][380/625] eta 0:
04:04 lr 0.002887 time 0.5568 (0.9993) loss 10.8801 (10.9645) grad_norm 0.6718 (inf) mem 45716MB