kayzhu/LSHash

using on word vectors

armintabari opened this issue · 2 comments

I want to use this on bunch of word vectors and find the similar ones.

Should I firs index all of the vectors, and query each one again to find the bucket number?

You index all the vectors, and then use 'query' to retrieve close matches. If you need to track where your vectors come from, etc. use the 'extra_data' argument in the 'index' method.

what format is needed for the 'extra_data' parameter? Does this save the label for the vector and is able to output the label rather than the vector itself?