Skipgram mode has bugs
Closed this issue · 1 comments
Hi Tzu-Ray,
I tested your code with --cbow 0, and it gives this error:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/Users/yvx5085/Github/pytorch-word2vec/main.py", line 212, in train_process_worker
data_queue.put(data)
NameError: name 'data' is not defined
So I went to line 212 of main.py, and noticed that 'data' might need to be replaced by 'chunk'.
Then I made the change and ran the code again, and got another error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/Users/yvx5085/Github/pytorch-word2vec/main.py", line 212, in train_process_worker
data_queue.put(chunk)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 341, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "stringsource", line 2, in View.MemoryView._memoryviewslice.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
So I guess it there should be a bug in the sg_producer function.
BTW, I have tried --cbow 1 with a small English data, and it worked out fine.
I like this project a lot, because it uses the flexibility of pytorch and cython, so that I can get a very good training speed without needing to deal with, for example, the custom ops in tensorflow.
I will make more tests on Chinese data, and will let you know the results.
Best luck!
Yang
Hi Yang,
You're right, this is a bug. I fixed it up, made some tests, and did a new commit. Please check out the new changes and let me know if there is still some problems. 😃
Thanks for your appreciation to this repo! FYI, I am using Chinese Gigaword corpus, and plan to use text8 for benchmarking (in the README).
Tzu-Ray