Created Date: 12Feb 2019
Embedding is the process of converting a word or a piece of text to a continuos vector space of real number, usually in low dimension.
In this repository, we have used Gensim's Word2Vec, fastText, GloVe.a
Gensim is an open-source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning. Gensim is implemented in Python and Cython.
Developed by RARE Technologies Ltd.
Download the pretrained model from: https://github.com/RaRe-Technologies/gensim-data
GloVe, coined from Global Vectors, is a model for distributed word representation. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity.
It is developed as an open-source project at Stanford
Download the pretrained model from: https://github.com/stanfordnlp/GloVe
fastText is a library for learning of word embeddings and text classification created by Facebook's AI Research lab. The model allows to create an unsupervised learning or supervised learning algorithm for obtaining vector representations for words.
download the pretrained model from: https://github.com/facebookresearch/fastText/blob/master/docs/pretrained-vectors.md
- Text similarity using embeddings
- Text classification using embeddings
- Embeddings Visualization