MNIST training example on GoTorch and LibTorch can not convergence to the same accuracy

Question

MNIST training example on GoTorch and LibTorch can not convergence to the same accuracy

Yancey1989 opened this issue 4 years ago · 1 comments

Yancey1989 commented 4 years ago

LibTorch MNIST example got the loss 0.0269 after 5 epochs:

Epoch 0, Loss: 0.1280
Epoch 1, Loss: 0.0659
Epoch 2, Loss: 0.0396
Epoch 3, Loss: 0.0304
Epoch 4, Loss: 0.0269

GoTorch MNIST example got the loss 1.4148 after 5 epochs:

2020/08/12 22:37:41 Epoch: 0, Loss: 4.8264
2020/08/12 22:37:46 Epoch: 1, Loss: 5.9624
2020/08/12 22:37:52 Epoch: 2, Loss: 2.4493
2020/08/12 22:37:58 Epoch: 3, Loss: 0.9619
2020/08/12 22:38:04 Epoch: 4, Loss: 1.4148

Answer 1 · 2020-08-13T05:59:24.000Z

After merged #115 and #116

Go version got a similar but not the same loss:

2020/08/13 00:40:43 Epoch: 0, Loss: 0.2228
2020/08/13 00:40:48 Epoch: 1, Loss: 0.1510
2020/08/13 00:40:53 Epoch: 2, Loss: 0.0509
2020/08/13 00:40:58 Epoch: 3, Loss: 0.0622
2020/08/13 00:41:04 Epoch: 4, Loss: 0.0786

After comparing the initialized fc.weight between C++ and Gotorch, fc1.weight are the same, but fc2.weight are different. The following code snippet can reproduce the problem simply.

C++ code and weight value

torch::manual_seed(1);
torch::nn::Linear fc1(2,3);
torch::nn::Linear fc2(2,3);
std::cout << fc1->weight <<  std::endl;
std::cout << fc2->weight <<  std::endl;

 0.3643 -0.3121
-0.1371  0.3319
-0.6657  0.4241
[ CPUFloatType{3,2} ]
-0.0866  0.1961
 0.0349  0.2583
-0.2756 -0.0516
[ CPUFloatType{3,2} ]

Go version and weight value

initializer.ManualSeed(1)
fc1 := nn.Linear(2, 3, false)
fc2 := nn.Linear(2, 3, false)
log.Print(fc1.Weight)
log.Print(fc2.Weight)

0.3643 -0.3121
-0.1371  0.3319
-0.6657  0.4241
[ CPUFloatType{3,2}
-0.1455  0.3597
 0.0983 -0.0866
 0.1961  0.0349
[ CPUFloatType{3,2}