word2vec Binary Index & Locally-Sensitive Hashing Module
GoogleCodeExporter opened this issue · 2 comments
GoogleCodeExporter commented
Just released a Ruby module that builds an index of a binary word2vec vector
file, so your code can seek directly to the right position in the file for a
given word or term. For example, the word "/en/italy" in the English
"freebase-vectors-skipgram1000-en.bin" file is at byte position 116414 position.
The module also computes a locally-sensitive hash for each vector in a binary
word2vec file, so you can do a nearest neighbor search (i.e. cosine distance)
much faster. I get a couple orders of magnitude better performance on my
machine, with a 10 bit random projection LSH.
https://github.com/someben/treebank/blob/master/src/build_word2vec_index.rb
Thanks for the project, Tomas.
Best,
Ben
Original issue reported on code.google.com by bgimp...@googlemail.com
on 23 Sep 2013 at 3:48
GoogleCodeExporter commented
Sounds cool, thanks for sharing your code! By the way, there is a discussion
forum related to word2vec that might be more suitable for this type of post:
https://groups.google.com/forum/#!forum/word2vec-toolkit
It might be easier for the others to find your post there.
Best,
Tomas
Original comment by tmiko...@google.com
on 23 Sep 2013 at 5:01
GoogleCodeExporter commented
Ahh, of course -- I've re-posted there.
Original comment by bgimp...@googlemail.com
on 23 Sep 2013 at 5:09