chengyangfu/pytorch-vgg-cifar10

I run your code and always nan loss, can you help me?

JiyueWang opened this issue · 1 comments

I just run with ./run.sh and got nan loss after a few steps. Here is the printed log:

(base) root@For-Judy-And-Ian:~/pytorchProjects/pytorch-vgg-cifar10-master# ./run.sh
python main.py --arch=vgg11 --save-dir=save_vgg11 |& tee -a log_vgg11
Files already downloaded and verified
Epoch: [0 ][ 0 /391] Time 0.831 (0.831) Data 0.190 (0.190) Loss 2.3037 (2.3037) Prec@1 10.938 (10.938)
Epoch: [0 ][20 /391] Time 0.018 (0.052) Data 0.000 (0.009) Loss 2.2982 (2.3029) Prec@1 9.375 (9.487)
Epoch: [0 ][40 /391] Time 0.012 (0.035) Data 0.000 (0.005) Loss 2.2928 (2.3018) Prec@1 12.500 (9.546)
Epoch: [0 ][60 /391] Time 0.012 (0.030) Data 0.000 (0.003) Loss 2.2685 (2.2970) Prec@1 15.625 (10.720)
Epoch: [0 ][80 /391] Time 0.012 (0.027) Data 0.000 (0.002) Loss 2.1417 (2.2787) Prec@1 21.875 (11.960)
Epoch: [0 ][100/391] Time 0.016 (0.026) Data 0.000 (0.002) Loss 2.1417 (2.2518) Prec@1 22.656 (13.134)
Epoch: [0 ][120/391] Time 0.029 (0.024) Data 0.000 (0.002) Loss 1.9975 (2.2189) Prec@1 21.094 (14.463)
Epoch: [0 ][140/391] Time 0.028 (0.024) Data 0.000 (0.002) Loss 2.0889 (2.1959) Prec@1 26.562 (15.459)
Epoch: [0 ][160/391] Time 0.018 (0.023) Data 0.000 (0.001) Loss 2.0179 (2.1856) Prec@1 21.875 (16.193)
Epoch: [0 ][180/391] Time 0.012 (0.023) Data 0.000 (0.001) Loss 1.9825 (2.1645) Prec@1 25.000 (16.894)
Epoch: [0 ][200/391] Time 0.012 (0.022) Data 0.000 (0.001) Loss 1.8724 (2.1434) Prec@1 27.344 (17.623)
Epoch: [0 ][220/391] Time 0.012 (0.022) Data 0.000 (0.001) Loss 2.0147 (2.1258) Prec@1 25.000 (18.121)
Epoch: [0 ][240/391] Time 0.012 (0.021) Data 0.000 (0.001) Loss 1.8679 (2.1128) Prec@1 22.656 (18.458)
Epoch: [0 ][260/391] Time 0.016 (0.021) Data 0.000 (0.001) Loss 1.8262 (2.0923) Prec@1 28.125 (19.202)
Epoch: [0 ][280/391] Time 0.012 (0.021) Data 0.000 (0.001) Loss 1.7779 (2.0737) Prec@1 31.250 (19.834)
Epoch: [0 ][300/391] Time 0.011 (0.020) Data 0.000 (0.001) Loss 1.7415 (2.0569) Prec@1 38.281 (20.359)
Epoch: [0 ][320/391] Time 0.012 (0.020) Data 0.000 (0.001) Loss 1.7895 (2.0431) Prec@1 26.562 (20.863)
Epoch: [0 ][340/391] Time 0.012 (0.020) Data 0.000 (0.001) Loss 1.7198 (2.0292) Prec@1 31.250 (21.355)
Epoch: [0 ][360/391] Time 0.012 (0.019) Data 0.000 (0.001) Loss 1.9042 (2.0171) Prec@1 27.344 (21.827)
Epoch: [0 ][380/391] Time 0.012 (0.019) Data 0.000 (0.001) Loss 2.6430 (2.0338) Prec@1 12.500 (21.900)
Test[0/79] Time 0.136 (0.136) Loss 2.3228 (2.3228) Prec@1 10.938 (10.938)
Test[20/79] Time 0.004 (0.013) Loss 2.3267 (2.3337) Prec@1 7.812 (8.891)
Test[40/79] Time 0.013 (0.009) Loss 2.3235 (2.3322) Prec@1 10.156 (8.670)
Test[60/79] Time 0.011 (0.009) Loss 2.3311 (2.3303) Prec@1 10.156 (8.799)
* Prec@1 8.810
Epoch: [1 ][ 0 /391] Time 0.099 (0.099) Data 0.085 (0.085) Loss 2.3538 (2.3538) Prec@1 8.594 (8.594)
Epoch: [1 ][20 /391] Time 0.028 (0.021) Data 0.000 (0.005) Loss nan (nan) Prec@1 1.562 (8.036)
Epoch: [1 ][40 /391] Time 0.018 (0.019) Data 0.000 (0.003) Loss nan (nan) Prec@1 1.562 (5.011)
Epoch: [1 ][60 /391] Time 0.012 (0.017) Data 0.000 (0.002) Loss nan (nan) Prec@1 1.562 (3.893)
Epoch: [1 ][80 /391] Time 0.012 (0.016) Data 0.000 (0.002) Loss nan (nan) Prec@1 2.344 (3.279)
Epoch: [1 ][100/391] Time 0.013 (0.016) Data 0.002 (0.001) Loss nan (nan) Prec@1 2.344 (2.908)
Epoch: [1 ][120/391] Time 0.017 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 3.906 (2.686)
Epoch: [1 ][140/391] Time 0.012 (0.016) Data 0.000 (0.001) Loss nan (nan) Prec@1 2.344 (2.549)
Epoch: [1 ][160/391] Time 0.012 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 2.344 (2.451)
Epoch: [1 ][180/391] Time 0.017 (0.016) Data 0.000 (0.001) Loss nan (nan) Prec@1 3.906 (2.348)
Epoch: [1 ][200/391] Time 0.012 (0.016) Data 0.000 (0.001) Loss nan (nan) Prec@1 2.344 (2.320)
Epoch: [1 ][220/391] Time 0.011 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 0.781 (2.238)
Epoch: [1 ][240/391] Time 0.012 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 0.781 (2.217)
Epoch: [1 ][260/391] Time 0.013 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 1.562 (2.176)
Epoch: [1 ][280/391] Time 0.012 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 2.344 (2.149)
Epoch: [1 ][300/391] Time 0.016 (0.015) Data 0.000 (0.001) Loss nan (nan) Prec@1 1.562 (2.108)
Epoch: [1 ][320/391] Time 0.018 (0.015) Data 0.007 (0.001) Loss nan (nan) Prec@1 0.781 (2.078)
Epoch: [1 ][340/391] Time 0.017 (0.015) Data 0.006 (0.001) Loss nan (nan) Prec@1 3.125 (2.067)
Epoch: [1 ][360/391] Time 0.018 (0.015) Data 0.006 (0.001) Loss nan (nan) Prec@1 3.906 (2.052)
Epoch: [1 ][380/391] Time 0.012 (0.016) Data 0.000 (0.001) Loss nan (nan) Prec@1 0.781 (2.010)
Test[0/79] Time 0.094 (0.094) Loss nan (nan) Prec@1 0.000 (0.000)
Test[20/79] Time 0.009 (0.014) Loss nan (nan) Prec@1 0.000 (0.335)
Test[40/79] Time 0.015 (0.015) Loss nan (nan) Prec@1 0.000 (0.419)
Test[60/79] Time 0.015 (0.015) Loss nan (nan) Prec@1 0.000 (0.538)
* Prec@1 0.540

change your learning rate to 0.001.