panyang/Wikipedia_Word2vec

The prompt error is:

molyswu opened this issue · 2 comments

Hi,
I implement:
v1# python train_word2vec_model.py wiki.zh.text.jian.seg.utf-8 wiki.zh.text.model wiki.zh.text.vector
2017-05-12 01:19:45,578: INFO: running train_word2vec_model.py wiki.zh.text.jian.seg.utf-8 wiki.zh.text.model wiki.zh.text.vector
2017-05-12 01:19:45,594: INFO: collecting all words and their counts
2017-05-12 01:19:45,648: INFO: PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2017-05-12 01:19:50,171: INFO: PROGRESS: at sentence #10000, processed 6464399 words, keeping 725285 word types
2017-05-12 01:19:53,546: INFO: PROGRESS: at sentence #20000, processed 11125064 words, keeping 1120049 word types
2017-05-12 01:19:58,920: INFO: PROGRESS: at sentence #30000, processed 15348776 words, keeping 1423306 word types
2017-05-12 01:20:01,128: INFO: PROGRESS: at sentence #40000, processed 19278980 words, keeping 1693287 word types
2017-05-12 01:20:03,203: INFO: PROGRESS: at sentence #50000, processed 22967412 words, keeping 1928859 word types
2017-05-12 01:20:04,554: INFO: PROGRESS: at sentence #60000, processed 26514303 words, keeping 2139812 word types
2017-05-12 01:20:07,120: INFO: PROGRESS: at sentence #70000, processed 29850501 words, keeping 2337565 word types
2017-05-12 01:20:09,387: INFO: PROGRESS: at sentence #80000, processed 33111262 words, keeping 2527187 word types
2017-05-12 01:20:11,163: INFO: PROGRESS: at sentence #90000, processed 36251605 words, keeping 2695901 word types
Traceback (most recent call last):
File "train_word2vec_model.py", line 27, in
workers=multiprocessing.cpu_count())
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 478, in init
self.build_vocab(sentences, trim_rule=trim_rule)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 553, in build_vocab
self.scan_vocab(sentences, progress_per=progress_per, trim_rule=trim_rule) # initial survey
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 575, in scan_vocab
vocab[word] += 1
MemoryError
Thank you!
molyswu

"MemoryError", you need a big memory or test it in a small data

Hi,

import gensim
model = gensim.models.Word2Vec.load("wiki.zh.text.model")
model.most_similar(u"足球")
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 1204, in most_similar
return self.wv.most_similar(positive, negative, topn, restrict_vocab, indexer)
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 300, in most_similar
self.init_sims()
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 813, in init_sims
self.syn0norm = (self.syn0 / sqrt((self.syn0 ** 2).sum(-1))[..., newaxis]).astype(REAL)
MemoryError

Thank you!
molyswu