how much memory in your machine?
SeekPoint opened this issue · 4 comments
it looks stuck on the 16G memory while run 'preprocess_partial_ner/save_emb.py'
It's pretty large. Over 200GB. You can do some filtering before you run the save_emb. For some of the words, they never occur in the corpus.
200G mem?it looks terrible, maybe me we just need a light version
Sorry for the confusion, our server has over 200 GB memory, but i think such large memory usage is not necessary.
That script is mainly designed for the pre-saving the word-embedding. As you may notice, the pre-trained embedding file is pretty large, and a large memory consumption is expected.
But definitely, we will also release a light version later!
We have replaced the .txt download to the .pk. It will skip the encoding step. Maybe you can have a try now.