The below image illustrates how the algorithm has learned to represent semantically similar words into close spatial points.
This Python package uses PyTorch to implement the Word2Vec algorithm using skip-gram architecture.
We provide the following resources that were used to build this package. We suggest reading these either beforehand or while you're exploring the code.
- Word2Vec paper from Mikolov et al.
- Neural Information Processing Systems, paper with improvements for Word2Vec also from Mikolov et al.
The Word2Vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. This way, words that show up in similar contexts, such as code, programming or python will have vectors representation near from each other.
In this implementation, we'll be using the skip-gram architecture because it performs better than Continuous Bag-Of-Words. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.
Hopefully, the following diagram will help to settle down the intuition:
We have used a series of Wikipedia articles provided by Matt Mahoney, you can find a broader description by clicking here.
Below is an approximate diagram of the general structure of the network:
In this section, we will show some preliminary results. But before, lest talk a bit about how can we take advantage of the embeddings.
We can encode a given word as vectors
The image below shows some randomly selected words, followed by a set of words with which they share a similar context: