Heroku buildpack for PredictionIO
Enables data scientists and developers to deploy custom machine learning services created with PredictionIO.
This buildpack is part of an exploration into utilizing the Heroku developer experience to simplify data science operations. When considering this proof-of-concept technology, please note its current limitations. We'd love to hear from you. Open issues on this repo with feedback and questions.
Releases
September 28th, 2017: PredictionIO 0.12.0-incubating is now supported.
See all releases with their changes.
Engines
Create and deploy engines for PredictionIO versions:
- 0.12.0-incubating
- Scala 2.11.8, Spark 2.1.1, & Hadoop 2.7.3
- specify these versions in the engine template's configs
- 0.11.0-incubating
- Scala 2.11.8, Spark 2.1.0, & Hadoop 2.7.3
- specify these versions in the engine template's configs
0.10.0-incubating- no longer supported
- see how to upgrade or temporarily fix
Get started with an engine:
- Universal Recommender engine
- presented at TrailheaDX 2017
- Classification engine
- presented at TrailheaDX 2017 & Dreamforce 2016
- Regression engine
- to be presented at Dreamforce 2017
Open-source at Salesforce booth in the Developer Forest
3:30-6pm, Wednesday, November 8
- to be presented at Dreamforce 2017
- Template Gallery
- starting-points for many use-cases
- follow custom engine docs to use with this buildpack
Architecture
This buildpack transforms the Scala source-code of a PredictionIO engine into a Heroku app.
The events data can be stored in:
- PredictionIO event storage backed by Heroku PostgreSQL
- compatible with this buildpack's built-in Data Flow features providing initial data load & sync automation
- compatible with most engine templates; required by some (e.g. UR)
- supports RESTful ingestion & querying via PredictionIO's built-in Eventserver
- custom data store such as Heroku Connect with PostgreSQL or RDD/DataFrames stored in HDFS
- requires a highly technical, custom implementation of
DataSource.scala
- requires a highly technical, custom implementation of
Limitations
Memory
PredictionIO requires 2GB of memory. It runs well on Heroku's Performance dynos with 2.5GB or 14GB RAM. Smaller dynos cannot run PredictionIO reliably.
This buildpack automatically trains the model during release phase, which executes Spark as a sub-process (i.e. --master local
) within one-off and web dynos. If the dataset and operations performed with Spark require more than 14GB memory, then it's possible to point the engine's Spark driver at an existing Spark cluster. (Running a Spark cluster is beyond the scope of this buildpack.) See: customizing environment variables, PIO_SPARK_OPTS
& PIO_TRAIN_SPARK_OPTS
.