glample/tagger

Can my Chinese data be used in this program?(character-level)

Closed this issue · 3 comments

PCR11 commented

Thanks for your share of this program,it is very useful for most people. I have implemented it with the english corpus that you shared. Because of my hardware so i fix the parameter --char_dim form 25 to 5 and --word_dim form 100 to 10,then i get the result :
44424/46435 (95.66922%)
Score on dev: 88.03000
Score on test: 81.13000
13950, cost average: 0.043258
14000, cost average: 0.104217
Epoch 99 done. Average cost: 0.044536
Is it a normal result?
and i read your paper it said the word representations are generated from the characters they are composed of. Is it mean the input of english word will be separated to characters and then generate a new word embeddings within bi-lstm?
And now i want to ues it with my chinese data, the format of my data
给 O
予 O
局 B-T
部 I-T
抗 I-T
炎 I-T
It is like your english corpus's format, but english is one english word one line,and my data is one chinese character one line ,Can the data of this format be used in this program?
I am looking forward to your help.Thank you very much.

Sorry @PCR11 , I don't know chinese enough to indicate how you should adapt your corpus to run the network, but I think it's definetely possible 🤔