business_embeddings: A Python repository from yaoyang33

Yelp Embeddings

This repo references https://github.com/acocos/business_embeddings, updates and fixed bugs from the original repo

This repo contains code used to generate business embeddings for the Yelp Academic Dataset as detailed in this blog.

file/directory	description
`src/pipeline.sh`	This script demonstrates how to extract business/context pairs from the Yelp data and use them to train word embeddings using `word2vecf`. You can run this pipeline (after downloading the Yelp data and installing `word2vecf`, see below).
`src/extract_contexts.py`	Generates business/context pairs from the Yelp data
`src/infer.py`	Script from the original `word2vecf` code, useful for loading and manipulating the resulting vectors
`src/examine_places.py`	Script used to produce results given in blog post
`data/`	Download the Yelp data and extract it to this directory
`data/processed`	If you run `pipeline.sh`, the resulting vectors will end up here. Or you can download them and put them there on your own.

The code in this repo depends on the word2vecf adaptation of the popular word2vec software, allowing the use of arbitrary contexts to train vectors. It was developed by researchers at Bar-Ilan University and is available here. You'll need to download and install before running the pipeline to train the vectors on your own.
If you want to train your own vectors, you'll also need to download the Yelp data and extract it to the ./data directory.
Once you have done those two things, you will be able to run src/pipeline.sh to generate your own business embeddings.