shape mismatch error running 06_precompute_cache.py
joshweir opened this issue · 2 comments
Running scripts/06_precompute_cache.py
against the 2015 or 2019 data model fails:
$ python ./06_precompute_cache.py -c 100000 -n 100 $S2V_MODEL_PATH
✔ Loaded 1,195,261 vectors with dimension 128
✔ Normalized (mean 3.76, variance 1.89)
ℹ Finding 100 neighbors among 100,000 most frequent
8%|██████▉ | 97/1168 [15:21<2:48:46, 9.46s/it]
Traceback (most recent call last):
File "./06_precompute_cache.py", line 176, in <module>
plac.call(main)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "./06_precompute_cache.py", line 70, in main
xp.put_along_axis(sims, indices, -xp.inf, axis=1)
File "<__array_function__ internals>", line 6, in put_along_axis
File "/usr/local/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 262, in put_along_axis
arr[_make_along_axis_idx(arr_shape, indices, axis)] = values
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1024,1) (672,1)
I think you might be running an older commit? On the version of the script on master, the put_along_axis
call has been replaced. So if you pull, you should be able to get it working.
It's much faster on GPU if you have one, by the way.
I pulled master but couldn't get it to work. Master looks to have the put_along_axis
call at line 70:
sense2vec/scripts/06_precompute_cache.py
Line 70 in 61eb3a7
If I change from the numpy call xp.put_along_axis
to the put_along_axis
defined in the script, this will fail because I dont have cupy
installed (because this is running on my macbook pro which does not have nvidia gpu and hence no cuda
installed).
josh@JoshsMacBook ~/sense2vec (master=)
$ git pull origin master
From https://github.com/explosion/sense2vec
* branch master -> FETCH_HEAD
Already up to date.
josh@JoshsMacBook ~/sense2vec (master=)
$ python ./scripts/06_precompute_cache.py -c 1000 -n 10 $S2V_MODEL_PATH
✔ Loaded 1,195,261 vectors with dimension 128
✔ Normalized (mean 3.76, variance 1.89)
ℹ Finding 10 neighbors among 1,000 most frequent
0%| | 0/1168 [00:00<?, ?it/s]
Traceback (most recent call last):
File "./scripts/06_precompute_cache.py", line 176, in <module>
plac.call(main)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "./scripts/06_precompute_cache.py", line 70, in main
xp.put_along_axis(sims, indices, -xp.inf, axis=1)
File "<__array_function__ internals>", line 6, in put_along_axis
File "/usr/local/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 262, in put_along_axis
arr[_make_along_axis_idx(arr_shape, indices, axis)] = values
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1024,1) (1000,1)