vecto-ai/vecto

incompatible array types Error

MoizRauf opened this issue · 5 comments

Hi,

Im trying to train embeddings on a multilingual data and get the following Incompatible array exception with the following parameters
--subword bilstm --dimension 300 --verbose --gpu

Traceback (most recent call last):
File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/train_word2vec.py", line 260, in
main()
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/train_word2vec.py", line 256, in main
run(args)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/train_word2vec.py", line 242, in run
model = train(args)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/train_word2vec.py", line 235, in train
model = create_model(args, model, vocab)

File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/train_word2vec.py", line 120, in create_model
model.matrix = cuda.to_cpu(net.getEmbeddings())
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/utils/subword.py", line 488, in getEmbeddings
return self.getEmbeddings_f()
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/utils/subword.py", line 528, in getEmbeddings_f
e_batch = self.f(tokenIdsList_merged, tokenIdsList_merged_b, argsort, argsort_reverse, pList)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/utils/subword.py", line 304, in call
self.rnn(tokenIdsList_ordered[:, i])
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/vecto/embeddings/utils/subword.py", line 375, in rnn
x = self.embed(cur_word)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/chainer/link.py", line 242, in call
out = forward(*args, **kwargs)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/chainer/links/connection/embed_id.py", line 70, in forward
return embed_id.embed_id(x, self.W, ignore_label=self.ignore_label)
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/chainer/functions/connection/embed_id.py", line 164, in embed_id
return EmbedIDFunction(ignore_label=ignore_label).apply((x, W))[0]
File "/mount/arbeitsdaten31/studenten1/raufmz/virtual/vecto/lib/python3.7/site-packages/chainer/function_node.py", line 237, in apply
', '.join(str(type(x)) for x in in_data)))
TypeError: incompatible array types are mixed in the forward input (EmbedIDFunction).
Actual: <class 'numpy.ndarray'>, <class 'cupy.core.core.ndarray'>

Hi, MoizRauf,

Sorry for the inconvience.
The error is a known bug caused by Chainer versions. We have already fixed it in the "cli-train-embeddings" branch.
A quick solution would be use that branch to train the model. We will merge "cli-train-embeddings" branch into master branch and get back to you soon.

Best,
Bofang

Hi Bofang, Thankyou for the reply, I'll try with the cli-train branch. Additionally, I would like to ask the average time the code took while training on cpu. Regards, Moiz

Hi, Moiz,

The training code is only designed to train on GPU. I don't have the exact numbers, but I think it's not feasible to train on CPU even on small corpus.

Best,
Bofang
I

Hi, Moiz,

We just merged "cli-train-embeddings" branch into "master" branch. You can also use the master branch now.

Best,
Bofang