/word-embedding

Word2Vec, GloVe in Golang

Primary LanguageGoApache License 2.0Apache-2.0

Word Embedding in Golang

Build Status GoDoc Go Report Card

This is an implementation of word embedding (a.k.a word representation) models in Golang.

Details

Word embedding makes word's meaning, structure, and concept mapping into vector space with low dimension. For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, it could calculate word meaning by arithmetic operations between vectors.

Features

Listed models for word embedding, and checked it already implemented.

Models

  • Word2Vec
    • Distributed Representations of Words and Phrases and their Compositionality [pdf]
  • GloVe
    • GloVe: Global Vectors for Word Representation [pdf]

and more...

Installation

$ go get -u github.com/ynqa/word-embedding
$ bin/word-embedding -h

Usage

The tools embedding words into vector space

Usage:
  word-embedding [flags]
  word-embedding [command]

Available Commands:
  distance    Estimate the distance between words
  glove       Embed words using glove
  help        Help about any command
  word2vec    Embed words using word2vec

Flags:
  -h, --help   Help for word-embedding

For more information about each sub-command, see below:

Demo

Downloading text8 corpus, and training by Skip-Gram with negative sampling.

$ sh demo.sh

Output

Output a file is subject to the following format:

<word> <value1> <value2> ...

References

  • Just see it for more deep comprehension:
    • Improving Distributional Similarity with Lessons Learned from Word Embeddings [pdf]
    • Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors [pdf]