/predictionio-buildpack

Deploy predictive query engines built with PredictionIO, an open-source machine learning framework.

Primary LanguageShellMIT LicenseMIT

Heroku buildpack for PredictionIO

Enables data scientists and developers to deploy custom machine learning services created with PredictionIO.

This buildpack is part of an exploration into utilizing the Heroku developer experience to simplify data science operations. When considering this proof-of-concept technology, please note its current limitations. We'd love to hear from you. Open issues on this repo with feedback and questions.

Releases

September 28th, 2017: PredictionIO 0.12.0-incubating is now supported.

See all releases with their changes.

Engines

Create and deploy engines for PredictionIO versions:

Get started with an engine:

🐸 How to deploy an engine

Architecture

This buildpack transforms the Scala source-code of a PredictionIO engine into a Heroku app.

Diagram of Deployment to Heroku Common Runtime

The events data can be stored in:

  • PredictionIO event storage backed by Heroku PostgreSQL
    • compatible with this buildpack's built-in Data Flow features providing initial data load & sync automation
    • compatible with most engine templates; required by some (e.g. UR)
    • supports RESTful ingestion & querying via PredictionIO's built-in Eventserver
  • custom data store such as Heroku Connect with PostgreSQL or RDD/DataFrames stored in HDFS
    • requires a highly technical, custom implementation of DataSource.scala

Limitations

Memory

PredictionIO requires 2GB of memory. It runs well on Heroku's Performance dynos with 2.5GB or 14GB RAM. Smaller dynos cannot run PredictionIO reliably.

This buildpack automatically trains the model during release phase, which executes Spark as a sub-process (i.e. --master local) within one-off and web dynos. If the dataset and operations performed with Spark require more than 14GB memory, then it's possible to point the engine's Spark driver at an existing Spark cluster. (Running a Spark cluster is beyond the scope of this buildpack.) See: customizing environment variables, PIO_SPARK_OPTS & PIO_TRAIN_SPARK_OPTS.

Usage

🐸 Deploy an Engine to Heroku.

🛠 Use the Local Development workflow to setup an engine on your computer.

Leverage the buildpack's Data Flow to automate import & synchronization of event data.

🤓 Testing this buildpack & individual engines.