data set is too big (which is too big to be held in one machine's mem), and I should break it to small daily set
jackyhawk opened this issue · 2 comments
Thanks for the excellent code.
and I met one question, my data set is too big (which can not be held in one machine's mem), and I should break it to small daily set.
so I should first generate each day's walk result (sequence) and then train by other code(suan as Gensim) as word2vec.
All I want is the random walking result
as for the walking result, should I just return before the part listed as following?
and then save dw_rw to disk for latter training?
You will need to deal with multiprocessing slightly better than I do in the training loop. One option would be to just run the random walk generation and write to the file in the single thread. As for the place, it is correct.
Thanks very much.
Is there any other repo that is available to generate random walk sequence for big data set?
I found when I use data set bigger than 10 million edge, the memory required would be bigger than my memory capacity(200G)