kovasb/word2vec

word2vec Binary Index & Locally-Sensitive Hashing Module

GoogleCodeExporter opened this issue · 2 comments

Just released a Ruby module that builds an index of a binary word2vec vector 
file, so your code can seek directly to the right position in the file for a 
given word or term. For example, the word "/en/italy" in the English 
"freebase-vectors-skipgram1000-en.bin" file is at byte position 116414 position.

The module also computes a locally-sensitive hash for each vector in a binary 
word2vec file, so you can do a nearest neighbor search (i.e. cosine distance) 
much faster. I get a couple orders of magnitude better performance on my 
machine, with a 10 bit random projection LSH.

https://github.com/someben/treebank/blob/master/src/build_word2vec_index.rb

Thanks for the project, Tomas.

Best,
Ben

Original issue reported on code.google.com by bgimp...@googlemail.com on 23 Sep 2013 at 3:48

Sounds cool, thanks for sharing your code! By the way, there is a discussion 
forum related to word2vec that might be more suitable for this type of post:

https://groups.google.com/forum/#!forum/word2vec-toolkit

It might be easier for the others to find your post there.

Best,
Tomas

Original comment by tmiko...@google.com on 23 Sep 2013 at 5:01

Ahh, of course -- I've re-posted there.

Original comment by bgimp...@googlemail.com on 23 Sep 2013 at 5:09