Heroku buildpack for PredictionIO

Enables data scientists and developers to deploy custom machine learning services created with PredictionIO.

This buildpack is part of an exploration into utilizing the Heroku developer experience to simplify data science operations. When considering this proof-of-concept technology, please note its current limitations. We'd love to hear from you. Open issues on this repo with feedback and questions.

Releases

September 28th, 2017: PredictionIO 0.12.0-incubating is now supported.

See all releases with their changes.

Engines

Create and deploy engines for PredictionIO versions:

0.12.0-incubating
- Scala 2.11.8, Spark 2.1.1, & Hadoop 2.7.3
- specify these versions in the engine template's configs
0.11.0-incubating
- Scala 2.11.8, Spark 2.1.0, & Hadoop 2.7.3
- specify these versions in the engine template's configs
~~0.10.0-incubating~~
- no longer supported
- see how to upgrade or temporarily fix

Get started with an engine:

Universal Recommender engine
- presented at TrailheaDX 2017
Classification engine
- presented at TrailheaDX 2017 & Dreamforce 2016
Regression engine
- to be presented at Dreamforce 2017
  Open-source at Salesforce booth in the Developer Forest
  3:30-6pm, Wednesday, November 8
Template Gallery
- starting-points for many use-cases
- follow custom engine docs to use with this buildpack

🐸 How to deploy an engine

Architecture

This buildpack transforms the Scala source-code of a PredictionIO engine into a Heroku app.

The events data can be stored in:

PredictionIO event storage backed by Heroku PostgreSQL
- compatible with this buildpack's built-in Data Flow features providing initial data load & sync automation
- compatible with most engine templates; required by some (e.g. UR)
- supports RESTful ingestion & querying via PredictionIO's built-in Eventserver
custom data store such as Heroku Connect with PostgreSQL or RDD/DataFrames stored in HDFS
- requires a highly technical, custom implementation of DataSource.scala

Limitations

Memory

PredictionIO requires 2GB of memory. It runs well on Heroku's Performance dynos with 2.5GB or 14GB RAM. Smaller dynos cannot run PredictionIO reliably.

This buildpack automatically trains the model during release phase, which executes Spark as a sub-process (i.e. --master local) within one-off and web dynos. If the dataset and operations performed with Spark require more than 14GB memory, then it's possible to point the engine's Spark driver at an existing Spark cluster. (Running a Spark cluster is beyond the scope of this buildpack.) See: customizing environment variables, PIO_SPARK_OPTS & PIO_TRAIN_SPARK_OPTS.

Usage

🐸 Deploy an Engine to Heroku.

🛠 Use the Local Development workflow to setup an engine on your computer.

⏩ Leverage the buildpack's Data Flow to automate import & synchronization of event data.

🤓 Testing this buildpack & individual engines.