Implementing Word2Vec SkipGram

Visual Demo

The below image illustrates how the algorithm has learned to represent semantically similar words into close spatial points.

Introduction

This Python package uses PyTorch to implement the Word2Vec algorithm using skip-gram architecture.

We provide the following resources that were used to build this package. We suggest reading these either beforehand or while you're exploring the code.

Word2Vec paper from Mikolov et al.
Neural Information Processing Systems, paper with improvements for Word2Vec also from Mikolov et al.

Word2Vec

The Word2Vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. This way, words that show up in similar contexts, such as code, programming or python will have vectors representation near from each other.

In this implementation, we'll be using the skip-gram architecture because it performs better than Continuous Bag-Of-Words. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.

Hopefully, the following diagram will help to settle down the intuition:

Data

We have used a series of Wikipedia articles provided by Matt Mahoney, you can find a broader description by clicking here.

Model

Below is an approximate diagram of the general structure of the network:

Results

In this section, we will show some preliminary results. But before, lest talk a bit about how can we take advantage of the embeddings.

Cosine Similarity

We can encode a given word as vectors $\vec{a}$ using the embedding table, then calculate the similarity with each word vector $\vec{b}$ in the embedding table with the following equation:

$$ \mathrm{similarity} = \cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|} $$

Random Examples

The image below shows some randomly selected words, followed by a set of words with which they share a similar context:

HeberTU/word-2-vect-skipgram