/predictionio-engine-regression

PredictionIO regression engine, preset for Heroku buildpack

Primary LanguageScalaApache License 2.0Apache-2.0

⚠️ This project is no longer active. No further updates are planned.

PredictionIO regression engine for Heroku

A machine learning app deployable to Heroku with the PredictionIO buildpack.

This engine includes 5 different Spark MlLib regression algorithms:

Demo Story 🐸

Predict student's class grade based on their aptitude score. Several different models are trained with a small, example data set. The engine combines predictions from each algorithm and returns a single averaged prediction.

How To 📚

✏️ Throughout this document, code terms that start with $ represent a value (shell variable) that should be replaced with a customized value, e.g $EVENTSERVER_NAME, $ENGINE_NAME, $POSTGRES_ADDON_ID

Deploy to Heroku

Please follow steps in order.

  1. Requirements
  2. Regression engine
    1. Create the engine
    2. Import data
    3. Deploy the engine
    4. Scale-up
    5. Retry release
    6. Evaluation
  3. Local development

Usage

Once deployed, how to work with the engine.

Deploy to Heroku 🚀

Requirements

Regression Engine

Create the engine

git clone \
  https://github.com/heroku/predictionio-engine-regression.git \
  pio-engine-regress
  
cd pio-engine-regress

heroku create $ENGINE_NAME --buildpack https://github.com/heroku/predictionio-buildpack.git
heroku addons:create heroku-postgresql:hobby-dev
heroku config:set \
  PIO_EVENTSERVER_APP_NAME=regression \
  PIO_EVENTSERVER_ACCESS_KEY=$RANDOM-$RANDOM-$RANDOM-$RANDOM

Import data

Initial training data is automatically imported from data/initial-events.json.

👓 When you're ready to begin working with your own data, see data import methods in CUSTOM docs.

Deploy the engine

git push heroku master

# Follow the logs to see training & web start-up
#
heroku logs -t

⚠️ Initial deploy will probably fail due to memory constraints. Proceed to scale up.

Scale up

Once deployed, scale up the processes and config Spark to avoid memory issues. These are paid, professional dyno types:

heroku ps:scale \
  web=1:Standard-2X \
  release=0:Performance-L \
  train=0:Performance-L

Retry release

When the release (pio train) fails due to memory constraints or other transient error, you may use the Heroku CLI releases:retry plugin to rerun the release without pushing a new deployment:

# First time, install it.
heroku plugins:install heroku-releases-retry

# Re-run the release & watch the logs
heroku releases:retry
heroku logs -t

Usage ⌨️

Query for predictions

The engine can be queried with values that range from 30 to 95.

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d $'{"vector": [ 75 ]}'

The engine returns an averaged prediction:

{
  "prediction": 87.49467164813471
}

Evaluation

Three of the algorithms (linear, ridge & lasso regression) used in this Engine are setup for PredictionIO's hyperparamter tuning.

Start up a one-off dyno:

heroku run bash --size Performance-L

To run evaluation for the standard linear regression algorithm:

$ PredictionIO-dist/bin/pio eval  \
    org.template.regression.SGDMeanSquaredErrorEvaluation \
    org.template.regression.SGDEngineParamsList \
    -- $PIO_SPARK_OPTS

For the Lasso algorithm:

$ PredictionIO-dist/bin/pio eval  \
    org.template.regression.LassoMeanSquaredErrorEvaluation \
    org.template.regression.LassoEngineParamsList \
    -- $PIO_SPARK_OPTS

For the Ridge algoirthm:

$ PredictionIO-dist/bin/pio eval  \
    org.template.regression.RidgeMeanSquaredErrorEvaluation \
    org.template.regression.RidgeEngineParamsList \
    -- $PIO_SPARK_OPTS

✏️ Memory parameters are set to fit the dyno --size set in the heroku run command.

Diagnostics

If you hit any snags with the engine serving queries, check the logs:

heroku logs -t --app $ENGINE_NAME

If errors are occuring, sometimes a restart will help:

heroku restart --app $ENGINE_NAME

Local Development

If you want to customize an engine, then you'll need to get it running locally on your computer.

➡️ Setup local development

Import sample data

bin/pio app new regress
PIO_EVENTSERVER_APP_NAME=regress data/import-events -f data/initial-events.json

Run pio

bin/pio build
bin/pio train
bin/pio deploy