dict_file format
KoichiYasuoka opened this issue · 2 comments
Now I'm trying to train with nagisa.fit
by UD_Classical_Chinese-Kyoto (漢文) under 4-level classified-POS-system (4階層品詞). See my blog what I tried. I found dict_file
parameter in nagisa.fit
and I guessed it an outer dictionary (外部辞書). But I could not find any explanation or usage of the dict_file
in your document. How do I use dict_file
? Does it (and train_file
) support classified-POS-system?
Thank you for using nagisa. The dict_file
parameter is used as an outer dictionary, as you guessed.
I'm sorry I didn't fill out the details in my document.
Could you refer to sample.dict
in sample_datasets. The dict_file
is consists of tab delimiters (word\tpostag) and it will be effective for classifying POS tags.
I saw your blog and ran a program to train UD_Classical_Chinese-Kyoto. By tuning the hyperparameters, I was able to obtain even better the test POS-tagging f1-score. If you are interested in it, please run the program below.
nagisa.fit(train_file="lzh_kyoto-ud-train.txt",dev_file="lzh_kyoto-ud-dev.txt",test_file="lzh_kyoto-ud-test.txt",dict_file="lzh_udkanbun-dict.txt",model_name="lzh_kyoto-nagisa", dim_tagemb=32, decay=3)
Epoch LR Loss Time_m DevWS_f1 DevPOS_f1 TestWS_f1 TestPOS_f1
1 0.100 4.976 0.280 97.69 84.81 98.42 87.09
2 0.100 2.277 0.254 97.62 87.16 98.42 87.09
3 0.100 1.895 0.258 97.18 87.06 98.42 87.09
4 0.050 1.664 0.256 97.12 87.70 98.42 87.09
5 0.050 1.344 0.289 97.72 88.57 98.22 90.33
6 0.050 1.228 0.262 97.46 89.18 98.22 90.33
7 0.050 1.142 0.253 97.48 88.95 98.22 90.33
8 0.025 1.091 0.251 97.37 89.48 98.22 90.33
9 0.025 0.957 0.251 97.55 89.31 98.22 90.33
10 0.025 0.922 0.256 97.52 89.61 98.22 90.33