PredictionIO regression engine for Heroku
A machine learning app deployable to Heroku with the PredictionIO buildpack.
This engine includes 5 different Spark MlLib regression algorithms:
Predict student's class grade based on their aptitude score. Several different models are trained with a small, example data set. The engine combines predictions from each algorithm and returns a single averaged prediction.
✏️ Throughout this document, code terms that start with $
represent a value (shell variable) that should be replaced with a customized value, e.g $EVENTSERVER_NAME
, $ENGINE_NAME
, $POSTGRES_ADDON_ID
…
Please follow steps in order.
Once deployed, how to work with the engine.
- Heroku account
- Heroku CLI, command-line tools
- git
git clone \
https://github.com/heroku/predictionio-engine-regression.git \
pio-engine-regress
cd pio-engine-regress
heroku create $ENGINE_NAME --buildpack https://github.com/heroku/predictionio-buildpack.git
heroku addons:create heroku-postgresql:hobby-dev
heroku config:set \
PIO_EVENTSERVER_APP_NAME=regression \
PIO_EVENTSERVER_ACCESS_KEY=$RANDOM-$RANDOM-$RANDOM-$RANDOM
Initial training data is automatically imported from data/initial-events.json
.
👓 When you're ready to begin working with your own data, see data import methods in CUSTOM docs.
git push heroku master
# Follow the logs to see training & web start-up
#
heroku logs -t
Once deployed, scale up the processes and config Spark to avoid memory issues. These are paid, professional dyno types:
heroku ps:scale \
web=1:Standard-2X \
release=0:Performance-L \
train=0:Performance-L
When the release (pio train
) fails due to memory constraints or other transient error, you may use the Heroku CLI releases:retry plugin to rerun the release without pushing a new deployment:
# First time, install it.
heroku plugins:install heroku-releases-retry
# Re-run the release & watch the logs
heroku releases:retry
heroku logs -t
The engine can be queried with values that range from 30
to 95
.
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
-H "Content-Type: application/json; charset=utf-8" \
-d $'{"vector": [ 75 ]}'
The engine returns an averaged prediction:
{
"prediction": 87.49467164813471
}
Three of the algorithms (linear, ridge & lasso regression) used in this Engine are setup for PredictionIO's hyperparamter tuning.
Start up a one-off dyno:
heroku run bash --size Performance-L
To run evaluation for the standard linear regression algorithm:
$ PredictionIO-dist/bin/pio eval \
org.template.regression.SGDMeanSquaredErrorEvaluation \
org.template.regression.SGDEngineParamsList \
-- $PIO_SPARK_OPTS
For the Lasso algorithm:
$ PredictionIO-dist/bin/pio eval \
org.template.regression.LassoMeanSquaredErrorEvaluation \
org.template.regression.LassoEngineParamsList \
-- $PIO_SPARK_OPTS
For the Ridge algoirthm:
$ PredictionIO-dist/bin/pio eval \
org.template.regression.RidgeMeanSquaredErrorEvaluation \
org.template.regression.RidgeEngineParamsList \
-- $PIO_SPARK_OPTS
✏️ Memory parameters are set to fit the dyno --size
set in the heroku run
command.
If you hit any snags with the engine serving queries, check the logs:
heroku logs -t --app $ENGINE_NAME
If errors are occuring, sometimes a restart will help:
heroku restart --app $ENGINE_NAME
If you want to customize an engine, then you'll need to get it running locally on your computer.
➡️ Setup local development
bin/pio app new regress
PIO_EVENTSERVER_APP_NAME=regress data/import-events -f data/initial-events.json
bin/pio build
bin/pio train
bin/pio deploy