PredictionIO Universal Recommender for Heroku
A fork of the Universal Recommender version 0.5.0 deployable with the PredictionIO buildpack for Heroku. Due to substantial revisions to support Elasticsearch on Heroku, this fork lags behind the main UR; conceptual differences beyond version 0.5.0 are listed in the UR release log.
The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user taste indicators—it is called the Correlated Cross-Occurrence algorithm. …CCO is able to ingest any number of user actions, events, profile data, and contextual information. It then serves results in a fast and scalable way. It also supports item properties for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender.
The Heroku app depends on:
- Bonsai Add-on to provide the search engine (Elasticsearch 5.x)
- Heroku Postgres Add-on to provide the database
This engine demonstrates recommendation of items for a mobile phone user based on their purchase history. The model is trained with a small example data set.
✏️ Throughout this document, code terms that start with $
represent a value (shell variable) that should be replaced with a customized value, e.g $ENGINE_NAME
…
⚠️ Requirements- 🚀 Demo Deployment
- 🎯 Query for predictions
- 🛠 Local development
- 🎛 Configuration options
- Heroku account
- Heroku CLI, command-line tools
- git
Adaptation of the normal PIO engine deployment.
git clone \
https://github.com/heroku/predictionio-engine-ur.git \
pio-engine-ur
cd pio-engine-ur
heroku create $ENGINE_NAME
heroku buildpacks:add https://github.com/heroku/predictionio-buildpack.git
heroku config:set \
PIO_EVENTSERVER_APP_NAME=ur \
PIO_EVENTSERVER_ACCESS_KEY=$RANDOM-$RANDOM-$RANDOM-$RANDOM-$RANDOM-$RANDOM \
PIO_UR_ELASTICSEARCH_CONCURRENCY=1
heroku addons:create bonsai --as PIO_ELASTICSEARCH --version 5.4
Ensure the --version
you specify is a currently supported version.
In the Bonsai add-on's dashboard, verify that Elasticsearch is really the requested version. Only versions greater than 5.1 will work with this Heroku app. Caution: it's easy to accidentally provision the wrong version.
heroku addons:create heroku-postgresql:hobby-dev
- Use a higher-level, paid plan for anything but a small demo.
hobby-basic
is the smallest paid heroku-postgresql plan
Initial training data is automatically imported from data/initial-events.json
.
👓 When you're ready to begin working with your own data, read about strategies for importing data.
git push heroku master
# Follow the logs to see training & web start-up
#
heroku logs -t
Once deployed, scale up the processes to avoid memory issues:
heroku ps:scale \
web=1:Standard-2X \
release=0:Performance-L \
train=0:Performance-L
💵 These are paid, professional dyno types
When the release (pio train
) fails due to memory constraints or other transient error, you may use the Heroku CLI releases:retry plugin to rerun the release without pushing a new deployment:
# First time, install it.
heroku plugins:install heroku-releases-retry
# Re-run the release & watch the logs
heroku releases:retry
heroku logs -t
Once deployment completes, the engine is ready to recommend of items for a mobile phone user based on their purchase history.
Get all recommendations for a user:
# an Android user
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{"user": "100"}'
# an iPhone user
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{"user": "200"}'
Get recommendations for a user, excluding phones:
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{
"user": "100",
"fields": [{
"name": "category",
"values": ["phone"],
"bias": 0
}]
}'
Get accessory recommendations for a user excluding phones & boosting power-related items:
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{
"user": "100",
"fields": [{
"name": "category",
"values": ["phone"],
"bias": 0
},{
"name": "category",
"values": ["power"],
"bias": 1.5
}
}'
For a user with no purchase history, the recommendations will be based on popularity:
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{"user": "000"}'
Get recommendations based on similarity with an item:
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{"item": "101"}'
Get recommendations for a user boosting on similarity with an item:
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json" \
-d $'{
"user": "100",
"item": "101"
}'
👓 See the main Universal Recommender query docs for more parameters. Please note those docs have been updated for the newest version 0.6.0, but this repo provides version 0.5.0. Differences are listed in the UR release log.
Start in this repo's working directory. If you don't already have it cloned, then do it now:
git clone \
https://github.com/heroku/predictionio-engine-ur.git \
pio-engine-ur
cd pio-engine-ur
➡️ Setup local development including Elasticsearch.
bin/pio status
should succeed when this setup is complete.
bin/pio app new ur
PIO_EVENTSERVER_APP_NAME=ur data/import-events -f data/initial-events.json
bin/pio build
bin/pio train -- --driver-memory 2500m
bin/pio deploy
curl -X "POST" "http://127.0.0.1:8000/queries.json" \
-H "Content-Type: application/json" \
-d $'{
"user": "100",
"fields": [{
"name": "category",
"values": ["phone"],
"bias": 0
}]
}'
PIO_UR_ELASTICSEARCH_CONCURRENCY
- may increase in-line with the Bonsai Add-on plan's value for Concurrent Indexing
- the max for a dedicated Elasticsearch cluster is "unlimited", but in reality set it to match the number of Spark executor cores
PIO_UR_ELASTICSEARCH_INDEX_REPLICAS
- more replicas may improve concurrent search performance
- should increase in-line with the number of Elasticsearch nodes (n-1) in the cluster
- takes effect after the next training, when a new index is inserted