This experiment used Accuracy as evaluation metric.
The target sequence is zero padded to match the max length.
Despite the fact that accuracy can be a problem when in unbalance problems in this problem because there are many zero padded tokens, but accuracy was used because the model was not trained by putting a mask at zero padding token in the target sequence.
1st model experimental results.
scenario 1 evaluation results.
loss function : categorical crossentropy
loss : 1784.4952
accuracy : 0.2438
test loss : 1534.7498
test accuracy : 0.2537
scenario 2 evaluation results.
loss function : categorical crossesntropy
loss : 1825.4198
accuracy : 0.2274
test loss : 1795.3251
test accuracy : 0.2537
2nd model experimental results.
Optimizer : Adam with beta_1 = 0.1, beta_2 = 0.1 and learning rate exponential decaying by 0.9 initialized at 0.00001
loss function : categorical crossentropy
loss : 1.2822
accuracy : 0.8250
test loss : 1.2836
test accuracy : 0.8249
plots of loss and accuracy
plot of loss
plot of accuracy
Test accuracy is slightly higher than trian accuracy
Additional Experiment
After training my model, I implemented additional experiment. Therefore, the final results of 2nd model yielded much better results.
loss : 0.9736
accuracy : 0.8507
test loss : 0.9210
test accuracy : 0.8490
As a result, this 2nd model has potential to be a good translation model.