DeepLearnXMU/BattRAE

about the parameters in "config.ini"

Closed this issue · 5 comments

HI @bzhangGo :
I found that in the file "config.ini" a parameters named oov num can been seen, can you explain what it means exactly. is it the number of Chinese oov words or English or total of both. should I count it and rewrite it before I make the entire project.?
I found the vector of "" in the begin of both ch and en vector file, so I want to ask is it necessary?
In the file of demo-data set ,I found nearly all of the Chinese phrases start with % or $ , what's it mean?
PS: if there are some training tricks please tell me.

  1. OOV num indicates the minimum frequency of a word to be included in the vocabulary.
  2. I don't understand the "" vector problem you pointed out. Can you give more details?
  3. The demo data come from our extracted bilingual phrases in our SMT system. They have their original meaning, i.e. % means percent, and $ means us dollar. They are not specific symbols required by our model.

Thank you so much , question 2 means vector of "< / s >", sorry for loss it if I type it directly , and I have known that it is just a special space character create by C-word2vec tools and have canceled in python-word2vec tools. By the way, can you add an instruction to this project about all the parameters in the file of Config.ini to help others to understand it . That will be great!
thank you so much again with best wishes!

Thanks for your suggestions. I will provide more explanation to the parameters in Config.ini.

sorry for asking for some more questions, when I run the code a few Iterations it come to an error "段错误 (核心已转储)", have you ever meet that problem? and can you give me some advice? thanks

Sorry, but I didnot remember this kind of error. Perhaps it's because the index of array is beyond its size.