wangkuiyi/gotorch

MNIST training example on GoTorch and LibTorch can not convergence to the same accuracy

Yancey1989 opened this issue · 1 comments

LibTorch MNIST example got the loss 0.0269 after 5 epochs:

Epoch 0, Loss: 0.1280
Epoch 1, Loss: 0.0659
Epoch 2, Loss: 0.0396
Epoch 3, Loss: 0.0304
Epoch 4, Loss: 0.0269

GoTorch MNIST example got the loss 1.4148 after 5 epochs:

2020/08/12 22:37:41 Epoch: 0, Loss: 4.8264
2020/08/12 22:37:46 Epoch: 1, Loss: 5.9624
2020/08/12 22:37:52 Epoch: 2, Loss: 2.4493
2020/08/12 22:37:58 Epoch: 3, Loss: 0.9619
2020/08/12 22:38:04 Epoch: 4, Loss: 1.4148

After merged #115 and #116

Go version got a similar but not the same loss:

2020/08/13 00:40:43 Epoch: 0, Loss: 0.2228
2020/08/13 00:40:48 Epoch: 1, Loss: 0.1510
2020/08/13 00:40:53 Epoch: 2, Loss: 0.0509
2020/08/13 00:40:58 Epoch: 3, Loss: 0.0622
2020/08/13 00:41:04 Epoch: 4, Loss: 0.0786

After comparing the initialized fc.weight between C++ and Gotorch, fc1.weight are the same, but fc2.weight are different. The following code snippet can reproduce the problem simply.

C++ code and weight value

torch::manual_seed(1);
torch::nn::Linear fc1(2,3);
torch::nn::Linear fc2(2,3);
std::cout << fc1->weight <<  std::endl;
std::cout << fc2->weight <<  std::endl;
 0.3643 -0.3121
-0.1371  0.3319
-0.6657  0.4241
[ CPUFloatType{3,2} ]
-0.0866  0.1961
 0.0349  0.2583
-0.2756 -0.0516
[ CPUFloatType{3,2} ]

Go version and weight value

initializer.ManualSeed(1)
fc1 := nn.Linear(2, 3, false)
fc2 := nn.Linear(2, 3, false)
log.Print(fc1.Weight)
log.Print(fc2.Weight)
0.3643 -0.3121
-0.1371  0.3319
-0.6657  0.4241
[ CPUFloatType{3,2}
-0.1455  0.3597
 0.0983 -0.0866
 0.1961  0.0349
[ CPUFloatType{3,2}