about the parameters in "config.ini"
Closed this issue · 5 comments
HI @bzhangGo :
I found that in the file "config.ini" a parameters named oov num can been seen, can you explain what it means exactly. is it the number of Chinese oov words or English or total of both. should I count it and rewrite it before I make the entire project.?
I found the vector of "" in the begin of both ch and en vector file, so I want to ask is it necessary?
In the file of demo-data set ,I found nearly all of the Chinese phrases start with % or $ , what's it mean?
PS: if there are some training tricks please tell me.
- OOV num indicates the minimum frequency of a word to be included in the vocabulary.
- I don't understand the "" vector problem you pointed out. Can you give more details?
- The demo data come from our extracted bilingual phrases in our SMT system. They have their original meaning, i.e. % means percent, and $ means us dollar. They are not specific symbols required by our model.
Thank you so much , question 2 means vector of "< / s >", sorry for loss it if I type it directly , and I have known that it is just a special space character create by C-word2vec tools and have canceled in python-word2vec tools. By the way, can you add an instruction to this project about all the parameters in the file of Config.ini to help others to understand it . That will be great!
thank you so much again with best wishes!
Thanks for your suggestions. I will provide more explanation to the parameters in Config.ini.
sorry for asking for some more questions, when I run the code a few Iterations it come to an error "段错误 (核心已转储)", have you ever meet that problem? and can you give me some advice? thanks
Sorry, but I didnot remember this kind of error. Perhaps it's because the index of array is beyond its size.