{case1-3,5:model_dim=512 ; ffn_dim=1024; case4:model_dim=1024;ffn_dim=4096;}
{case1-4:drop_out=0.1; case5:drop_out=0.3}
case1 other_weightdecay=0 fc2_weightdecay=0.0001 |
case2 all_weightdecay=0 |
case3 all_weightdecay=0.0001 |
case4 other_weightdecay=0 fc2_weightdecay=0.0001 |
case5 other_weightdecay=0 fc2_weightdecay=0.0001 drop_out=0.3 |
---|---|---|---|---|
30.80 | 30.70 | 28.84 | 23.82 | 27.55 |
Transformers | ffn_dim*1.05 | ffn_dim*1.10 | ffn_dim*1.15 | ffn_dim*1.20 | ffn_dim*1.25 | ffn_dim*1.30 | |
---|---|---|---|---|---|---|---|
参数量 | other=35963400 fc2=6610944 |
other=36277356 fc2=6924288 |
other=36591312 fc2=7237632 |
other=36905268 fc2=7550976 |
other=37225380 fc2=7870464 |
other=37539336 fc2=8183808 |
|
BLEU | 33.37 | 31.16 | 31.76 | 31.69 | 31.51 | 31.93 | 31.60 |
(使用本代码跑出来的transformers的复现内容,BLEU指标数)
case_1:casetransformer_layer_1.py(随机初始化-->SVD分解-->中间的进行BP更新,左右两边固定)
case_2:transformer_layer_2.py(随机初始化-->SVD分解-->中间的赋值为1固定,左右两边固定-->三个都固定)
case_3:transformer_layer_3.py(随机初始化-->SVD分解-->中间的赋值为1,左右两边固定-->中间的从1进行BP更新)
case_4:transformer_layer_4.py(直接去掉一层)
transformrs | transformrs ELM(fixed) |
transformrs+ELM (fixed+node*1.2) |
transformrs+ELM (fixed+node*1.5) |
case_1 | case_2 | case_3 | case_4 |
---|---|---|---|---|---|---|---|
33.37 | 29.25 | 27.98 | 29.51 | 8.18 | 30.67 | 30.82 | 29.04 |
case_5:transformer_case5.py(使用QR分解,Q作为权重在后面不进行更新,bias从0开始更新)
case_6:transformer_case5.py(使用QR分解,Q作为权重在后面不进行更新,bias从randn开始更新)
case_7:transformer_case5.py(使用QR分解,Q作为权重在后面不进行更新,bias从0.1*randn开始更新)
case_8:transformer_case5.py(使用QR分解,Q作为权重在后面不进行更新,bias为0,且不更新)
case_5 | case_6 | case_7 | case_8 |
---|---|---|---|
29.87 | 29.21 | 29.93 | 27.97 |
case_9:transformer_case9.py(高斯初始化+FC2的weight_decay调整)
因为修改较多,提交了
(1)"transformer_case9.py"; [Transformer/modules/transformer_case9.py]
(2)"train2.py"; [main folder/train2.py]
(3)"transformer2.py" [Transformer/models/transformer2.py]
只有高斯初始化 | 高斯+weight_decay =0.0001 |
高斯+weight_decay =0.00005 |
高斯+weight_decay =0.001 |
---|---|---|---|
31.42 |
model_dim=512 ffn_dim=1024 FFN_dropout=attention_dropout=0.1 |
model_dim=512 ffn_dim=2048 FFN_dropout=attention_dropout=0.1 (base) |
model_dim=1024 ffn_dim=4096 FFN_dropout=attention_dropout=0.3 (big) |
|
---|---|---|---|
transformrs | 33.37 | 32.79 | 29.74 |
case_2 | 30.67 | 27.96 | 16.59 |
case_4 | 29.04 | 28.20 | 25.23 |
使用的是transformers(ltl)模型,这里使用的参数配置如下:
model_dim: 512
ffn_dim: 1024
head_num: 4
encoder_layers: 6
decoder_layers: 6
droprate=0.1
epoch=10