word2vec-api

Simple web service providing a word embedding API. The methods are based on Gensim Word2Vec implementation. Models are passed as parameters and must be in the Word2Vec text or binary format.

Install Dependencies

pip2 install -r requirements.txt

Launching the service

python word2vec-api --model path/to/the/model [--host host --port 1234]

python word2vec-api.py --model /path/to/GoogleNews-vectors-negative300.bin --binary BINARY --path /word2vec --host 0.0.0.0 --port 5000

Example calls

curl http://127.0.0.1:5000/word2vec/n_similarity?ws1=Sushi&ws1=Shop&ws2=Japanese&ws2=Restaurant
curl http://127.0.0.1:5000/word2vec/similarity?w1=Sushi&w2=Japanese
curl http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food[&negative=][&topn=]
curl http://127.0.0.1:5000/word2vec/model?word=restaurant
curl http://127.0.0.1:5000/word2vec/model_word_set

Note: The "model" method returns a base64 encoding of the vector. "model_word_set" returns a base64 encoded pickle of the model's vocabulary.

Where to get a pretrained model

In case you do not have domain specific data to train, it can be convenient to use a pretrained model. Please feel free to submit additions to this list through a pull request.

Model file	Number of dimensions	Corpus (size)	Vocabulary size	Author	Architecture	Training Algorithm	Context window - size	Web page
Google News	300	Google News (100B)	3M	Google	word2vec	negative sampling	BoW - ~5	link
Freebase IDs	1000	Gooogle News (100B)	1.4M	Google	word2vec, skip-gram	?	BoW - ~10	link
Freebase names	1000	Gooogle News (100B)	1.4M	Google	word2vec, skip-gram	?	BoW - ~10	link
Wikipedia+Gigaword 5	50	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	100	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	200	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Wikipedia+Gigaword 5	300	Wikipedia+Gigaword 5 (6B)	400,000	GloVe	GloVe	AdaGrad	10+10	link
Common Crawl 42B	300	Common Crawl (42B)	1.9M	GloVe	GloVe	GloVe	AdaGrad	link
Common Crawl 840B	300	Common Crawl (840B)	2.2M	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	25	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	50	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	100	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Twitter (2B Tweets)	200	Twitter (27B)	?	GloVe	GloVe	GloVe	AdaGrad	link
Wikipedia dependency	300	Wikipedia (?)	174,015	Levy & Goldberg	word2vec modified	word2vec	syntactic dependencies	link
DBPedia vectors (wiki2vec)	1000	Wikipedia (?)	?	Idio	word2vec	word2vec, skip-gram	BoW, 10	link

WilliamL1/word2vec-api

word2vec-api

Where to get a pretrained model