Report of empirical evaluation, observations and investigations on LSTM based Language Modeling
Model |
Val. PPL |
Test PPL |
Last Epoch |
Last LR |
Seq_len |
Tokenizer |
HLSTM |
1.826 |
1.779 |
64 |
0.007813 |
25 |
moses |
HLSTM |
1.952 |
1.907 |
143 |
2.98E-08 |
25 |
Split |
HLSTM |
2.38 |
2.296 |
155 |
3.81E-06 |
35 |
moses |
HLSTM |
2.847 |
2.782 |
113 |
0.00048828125 |
35 |
Basic_English |
HLSTM |
21.98 |
20.39 |
139 |
2.98E-08 |
50 |
moses |
HLSTM |
47.19 |
45.49 |
152 |
2.38E-07 |
50 |
Basic_English |
HLSTM |
51.75 |
49.58 |
151 |
2.33E-10 |
50 |
Split |
HLSTM |
68.75 |
63.34 |
122 |
3.81E-06 |
70 |
moses |
Model |
Val. PPL |
Test PPL |
Last Epoch |
Last LR |
Seq_len |
Tokenizer |
H2HLSTM |
3.069 |
2.987 |
145 |
4.66E-10 |
25 |
Split |
H2HLSTM |
4.085 |
3.855 |
117 |
1.91E-06 |
35 |
moses |
H2HLSTM-NTASGD |
32.32 |
31.01 |
108 |
9.53E-07 |
50 |
Basic_English |
H2HLSTM |
34.8 |
33.59 |
97 |
6.10E-05 |
50 |
Basic_English |
H2HLSTM |
38.15 |
35.18 |
88 |
2.98E-08 |
50 |
moses |
H2HLSTM |
67.48 |
64.29 |
99 |
2.29E-05 |
50 |
Split |
Model |
Val. PPL |
Test PPL |
Last Epoch |
Last LR |
Seq_len |
Tokenizer |
HLSTM |
1.885 |
1.942 |
12(24h) |
0.5 |
25 |
Split |
Model |
Val. PPL |
Test PPL |
Last Epoch |
Last LR |
Seq_len |
Tokenizer |
HLSTM |
1.964 |
1.782 |
149 |
2.91E-11 |
25 |
Split |
HLSTM |
2.651 |
2.651 |
2.355 |
1.49E-08 |
21 |
Basic_English |
H2HLSTM |
3.94 |
3.44 |
123 |
1.19E-07 |
21 |
Moses |
HLSTM |
87.17 |
79.64 |
78 |
3.81E-06 |
50 |
Basic_English |
LSTM |
87.75 |
74.87 |
66 |
1.90E-06 |
21 |
Moses |