explosion/sense2vec

shape mismatch error running 06_precompute_cache.py

joshweir opened this issue · 2 comments

Running scripts/06_precompute_cache.py against the 2015 or 2019 data model fails:

$ python ./06_precompute_cache.py -c 100000 -n 100 $S2V_MODEL_PATH
✔ Loaded 1,195,261 vectors with dimension 128
✔ Normalized (mean 3.76, variance 1.89)
ℹ Finding 100 neighbors among 100,000 most frequent
  8%|██████▉          | 97/1168 [15:21<2:48:46,  9.46s/it]
Traceback (most recent call last):
  File "./06_precompute_cache.py", line 176, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "./06_precompute_cache.py", line 70, in main
    xp.put_along_axis(sims, indices, -xp.inf, axis=1)
  File "<__array_function__ internals>", line 6, in put_along_axis
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 262, in put_along_axis
    arr[_make_along_axis_idx(arr_shape, indices, axis)] = values
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1024,1) (672,1)

I think you might be running an older commit? On the version of the script on master, the put_along_axis call has been replaced. So if you pull, you should be able to get it working.

It's much faster on GPU if you have one, by the way.

I pulled master but couldn't get it to work. Master looks to have the put_along_axis call at line 70:

xp.put_along_axis(sims, indices, -xp.inf, axis=1)

If I change from the numpy call xp.put_along_axis to the put_along_axis defined in the script, this will fail because I dont have cupy installed (because this is running on my macbook pro which does not have nvidia gpu and hence no cuda installed).

josh@JoshsMacBook ~/sense2vec (master=)
$ git pull origin master
From https://github.com/explosion/sense2vec
 * branch            master     -> FETCH_HEAD
Already up to date.
josh@JoshsMacBook ~/sense2vec (master=)
$ python ./scripts/06_precompute_cache.py -c 1000 -n 10 $S2V_MODEL_PATH
✔ Loaded 1,195,261 vectors with dimension 128
✔ Normalized (mean 3.76, variance 1.89)
ℹ Finding 10 neighbors among 1,000 most frequent
  0%|                                                                                               | 0/1168 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./scripts/06_precompute_cache.py", line 176, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "./scripts/06_precompute_cache.py", line 70, in main
    xp.put_along_axis(sims, indices, -xp.inf, axis=1)
  File "<__array_function__ internals>", line 6, in put_along_axis
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 262, in put_along_axis
    arr[_make_along_axis_idx(arr_shape, indices, axis)] = values
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1024,1) (1000,1)