dalinvip/cw2vec

Killed while initialize feature

Jun-jie-Huang opened this issue · 3 comments

Hi, I'm running your code to implement 'substoke' model with my 80G corpus, but it was killed.
Here' s the picture of the error. And I modify the run.sh like this:

path_input=/data2/private/huangjunjie/COS960
path_out=.

rm -rf ./bin
cp -rf ./word2vec/bin .

./bin/word2vec substoke -input ${path_input}/SogouT_all -infeature ./Simplified_Chinese_Feature/sin_chinese_feature.txt -output ${path_out}/cw2vec_vector -lr 0.025 -dim 300 -ws 5 -epoch 5 -minCount 10 -neg 5 -loss ns -minn 3 -maxn 18 -thread 20 -t 1e-4 -lrUpdateRate 100

Could you tell me what's wrong with it? Thanks.
image

80G corpus ? maybe it's out of memory, be killed.

It only takes 20% of the memory, and I can run 'fasttext' with the 80G corpus freely, so I guess it's not out of memory? It's killed while initializing stoke feature.

try modify max_vocab_size rebuild https://github.com/bamtercelboo/cw2vec/blob/master/word2vec/src/include/dictionary.h#L43.
I did not use such a large data set for training, it may be a problem of setting restrictions, you can try to modify.