
Ruby interface gem to use word2vec arithmetics.

Gem using word2vec functionality from https://code.google.com/archive/p/word2vec/

This gem was developed using the .c files of the Google word2vec as base. Mostly by applying copy-and-paste.


Add this line to your application's Gemfile:

gem 'word2vec-rb'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install word2vec-rb


Distance arithmetic: to find the nearest words, try:

require 'word2vec'

model = Word2vec::Model.load("./data/minimal.bin")
words = model.distance("from")
words.each do |w| 
  puts "#{w.first} #{w.last}"

Analogy arithmetic: to find the analogy with three words, try:

require 'word2vec'

model = Word2vec::Model.load("./data/minimal.bin")
words = model.analogy("spain", "madrid", "france")
# In a well prepared vectors file (high quality), first word would be "Paris"
words.each do |w| 
  puts "#{w.first} #{w.last}"

Accuray: test accuracy of the vectors:

Define a file with the analogies to test, format: : section heading Word1 Word2 Word3 Word4


: capital-common-countries
Athens Greece Baghdad Iraq
Athens Greece Bangkok Thailand
require 'word2vec'

model = Word2vec::Model.load(file_name)

# Outputs the results on terminal

Vocabulary: create a vocabulary file from a train file:

require 'word2vec'

Word2vec::Model.build\_vocab("./data/text7", "./data/vocab.txt")

The output file will have a list of words and its number of appearances separated by line break.

Tokenizer: create a binary file by tokenizing an input file

This method requires a vocabulary file precreated.

require 'word2vec'

Word2vec::Model.tokenize("./data/text7", "./data/vocab.txt", "./data/tokenized.bin")

The output file will contain a sequence of binary identificators of each word of the input file.

Read output file with:

long long id;
fread(&id, sizeof(id), 1, fi);

Load the word2vec output bin file (vectors.bin), into ruby array

require 'word2vec'

vector_array = Word2vec::load_vectors("./data/minimal.bin")

The vector_array variable will contain an array of pairs with the vocab and the vector the float values of each word.

Set parameter normalize: true to normalize the vectors.

require 'word2vec'

vector_array = Word2vec::Model.load_vectors("./data/minimal.bin", normalize: true)


After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Launch tests

$ rake spec

$ rake compile


Bug reports and pull requests are welcome on GitHub at https://github.com/madcato/word2vec-rb