cannot reproduce hg_s1_b1 result
Closed this issue · 6 comments
I noticed that you log here using a larger learnong rate from 0.001 and schedule=[150, 175, 200], below is part of your log
Epoch LR Train Loss Val Loss Train Acc Val Acc
1.000000 0.001000 0.001369 0.000828 0.070879 0.138562
2.000000 0.001000 0.000856 0.001058 0.158208 0.200655
3.000000 0.001000 0.000758 0.000854 0.213208 0.208725
4.000000 0.001000 0.000699 0.000596 0.281929 0.384714
5.000000 0.001000 0.000635 0.000575 0.337208 0.440630
6.000000 0.001000 0.000582 0.000541 0.421062 0.487058
7.000000 0.001000 0.000559 0.000521 0.467490 0.538204
8.000000 0.001000 0.000536 0.000495 0.514954 0.582253
9.000000 0.001000 0.000520 0.000483 0.549438 0.609111
10.000000 0.001000 0.000506 0.000469 0.574788 0.634015
11.000000 0.001000 0.000497 0.000475 0.595450 0.629678
12.000000 0.001000 0.000488 0.000458 0.610554 0.655569
13.000000 0.001000 0.000481 0.000464 0.621428 0.642120
14.000000 0.001000 0.000475 0.000444 0.634942 0.674910
15.000000 0.001000 0.000470 0.000445 0.643844 0.672073
16.000000 0.001000 0.000465 0.000457 0.649695 0.644244
17.000000 0.001000 0.000461 0.000434 0.657655 0.692058
18.000000 0.001000 0.000455 0.000432 0.669486 0.699718
19.000000 0.001000 0.000451 0.000431 0.675828 0.704502
20.000000 0.001000 0.000450 0.000427 0.676318 0.705441
21.000000 0.001000 0.000447 0.000423 0.685184 0.715312
22.000000 0.001000 0.000444 0.000439 0.687975 0.685048
23.000000 0.001000 0.000440 0.000420 0.694823 0.718964
24.000000 0.001000 0.000439 0.000423 0.697721 0.718909
25.000000 0.001000 0.000435 0.000417 0.704000 0.727210
26.000000 0.001000 0.000433 0.000420 0.706374 0.718607
27.000000 0.001000 0.000432 0.000414 0.706610 0.727208
28.000000 0.001000 0.000429 0.000415 0.713337 0.726208
29.000000 0.001000 0.000426 0.000414 0.718950 0.731994
and my training log drops down drastically, with the same lr and schedule with you, momentum=0 (default) or 0.1(your model internal parameter)
1.000000 0.001000 0.000911 0.001155 0.144576 0.245235
2.000000 0.001000 0.000635 6.696480 0.292642 0.002924
3.000000 0.001000 0.000599 79.269006 0.368526 0.000000
4.000000 0.001000 0.000577 342.079974 0.411786 0.001092
5.000000 0.001000 0.000560 1973.556534 0.447012 0.000176
is there any other should be changed on your default parameters ?
Hi @GarrickLin, same for me. Even I have tried hg_s8_b1
architecture by running the provided hg_s8_b1.sh
file. (It is available in drive files.)
Is there any substantial change after your experiments @bearpaw?
Here are the results so far:
Epoch: 2 | LR: 0.00050000
Processing |################################| (1854/1854) Data: 0.000218s | Batch: 0.484s | Total: 0:17:43 | ETA: 0:00:01 | Loss: 0.0061 | Acc: 0.0001
Processing |################################| (247/247) Data: 0.000111s | Batch: 0.384s | Total: 0:01:34 | ETA: 0:00:01 | Loss: 0.0065 | Acc: 0.0012
Epoch: 3 | LR: 0.00050000
Processing |################################| (1854/1854) Data: 0.000162s | Batch: 0.393s | Total: 0:17:18 | ETA: 0:00:01 | Loss: 0.0056 | Acc: 0.0007
Processing |################################| (247/247) Data: 0.000095s | Batch: 0.380s | Total: 0:01:33 | ETA: 0:00:01 | Loss: 0.0250 | Acc: 0.0004
Epoch: 4 | LR: 0.00050000
Processing |################################| (1854/1854) Data: 0.000209s | Batch: 0.858s | Total: 0:17:30 | ETA: 0:00:01 | Loss: 0.0049 | Acc: 0.0015
Processing |################################| (247/247) Data: 0.000110s | Batch: 0.379s | Total: 0:01:33 | ETA: 0:00:01 | Loss: 0.7430 | Acc: 0.0000
Epoch: 5 | LR: 0.00050000
Processing |################################| (1854/1854) Data: 0.000161s | Batch: 0.402s | Total: 0:17:45 | ETA: 0:00:01 | Loss: 0.0044 | Acc: 0.0033
Processing |################################| (247/247) Data: 0.000104s | Batch: 0.396s | Total: 0:01:37 | ETA: 0:00:01 | Loss: 3.5382 | Acc: 0.0000
Epoch: 6 | LR: 0.00050000
Processing |################################| (1854/1854) Data: 0.000187s | Batch: 0.442s | Total: 0:17:53 | ETA: 0:00:01 | Loss: 0.0043 | Acc: 0.0064
Processing |################################| (247/247) Data: 0.000093s | Batch: 0.393s | Total: 0:01:37 | ETA: 0:00:01 | Loss: 20.5096 | Acc: 0.0000
@mkocabas try to reduce the learnong rate, which works for me
Which value did you use? 2.5e-4? Did you remain optimizer as RMSProp?
I didn't change the default optimizer, but it might be the problem. It works after reducing the init learning rate to a smaller one (not remember very well), you can have a try.
Thanks!
Sorry for the confusing, for batch size 6, you should use lr 2.5-e4. You can use larger lr for larger batch size.