/BentoML

Model Serving Made Easy

Primary LanguagePythonApache License 2.0Apache-2.0

pypi status python versions Downloads build status Documentation Status join BentoML Slack

BentoML

BentoML is an open-source framework for high-performance ML model serving.

What does BentoML do?

  • Create API endpoint serving trained models with just a few lines of code
  • Support all major machine learning training frameworks
  • High-Performance online API serving with adaptive micro-batching support
  • Model Registry for teams, providing Web UI dashboard and CLI/API access
  • Flexible deployment orchestration with DevOps best practices baked-in, supporting Docker, Kubernetes, Kubeflow, Knative, AWS Lambda, SageMaker, Azure ML, GCP and more

👉 To follow development updates and discussion, join the Bentoml Slack community and the contributors mailing list.


Why BentoML

Getting Machine Learning models into production is hard. Data Scientists are not experts in building production services and DevOps best practices. The trained models produced by a Data Science team are hard to test and hard to deploy. This often leads us to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.

BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to build production-ready model serving endpoints, with common DevOps best practices and performance optimizations baked in.

Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.

Getting Started

BentoML requires python 3.6 or above, install with pip:

pip install bentoml

A minimal prediction service in BentoML looks something like this:

# https://github.com/bentoml/BentoML/blob/master/guides/quick-start/iris_classifier.py
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

This code defines a prediction service that bundles a scikit-learn model and provides an API that expects input data in the form of pandas.Dataframe. The user-defined API function predict defines how the input dataframe data will be processed and used for inference with the bundled scikit-learn model. BentoML also supports other API input types such as ImageInput, JsonInput and more.

The following code trains a scikit-learn model and packages the trained model with the IrisClassifier class defined above. It then saves the IrisClassifier instance to disk in the BentoML SavedBundle format:

# https://github.com/bentoml/BentoML/blob/master/guides/quick-start/main.py
from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y = iris.data, iris.target

    # Model Training
    clf = svm.SVC(gamma='scale')
    clf.fit(X, y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    iris_classifier_service.pack('model', clf)

    # Save the prediction service to disk for model serving
    saved_path = iris_classifier_service.save()

By default, BentoML stores SavedBundle files under the ~/bentoml directory. Users can also customize BentoML to use a different directory or cloud storage like AWS S3 and MinIO, via BentoML's model management component YataiService, which provides advanced model management features including a dashboard web UI:

BentoML YataiService Bento Repository Page

BentoML YataiService Bento Details Page

Learn more about using YataiService for model management and try out the Web UI here.

The BentoML SavedBundle directory contains all the code, data and configs required to deploy the model. To start a REST API model server with the IrisClassifier SavedBundle, use the bentoml serve command:

bentoml serve IrisClassifier:latest

The IrisClassifier model is now served at localhost:5000. Use curl command to send a prediction request:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \
  http://localhost:5000/predict

The BentoML API server also provides a web UI for accessing predictions and debugging the server. Visit http://localhost:5000 in the browser and use the Web UI to send prediction request:

BentoML provides a convenient way to containerize the model API server with Docker:

  1. Find where the SavedBundle directory is created in the file system:

    • The saved path is return by the iris_classifier_service.save() call
    • The saved path is printed in the stdout when saving: INFO - BentoService bundle 'IrisClassifier:20200121114004_360ECB' saved to: ...
    • Use the bentoml get IrisClassifier:latest command to view all the metadata including saved path
  2. Run docker build under the SavedBundle directory which contains a generated Dockerfile, which will build a docker image containing the IrisClassifier API server

# If jq command not found, install jq (the command-line JSON processor) here: https://stedolan.github.io/jq/download/
saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

# Build the docker image
docker build -t iris-classifier $saved_path

# Start a container with the image build above
docker run -p 5000:5000 iris-classifier

This docker images makes it possible to deploy BentoML saved bundle to container orchestration platforms such as Kubeflow, Knative, Kubernetes, which provides advanced model deployment features such as auto-scaling, A/B testing, scale-to-zero, canary rollout and multi-armed bandit.

BentoML can also deploy SavedBundle directly to cloud services such as AWS Lambda or AWS SageMaker, with the bentoml CLI command. For a list of all deployment options with BentoML, check out the BentoML deployment guides.

Documentation

BentoML full documentation: https://docs.bentoml.org/

Examples

Visit bentoml/gallery repository for more examples and tutorials.

FastAI

Scikit-Learn

PyTorch

Keras (with Tensorflow 1.0)

Tensorflow 2.0 / tf.keras

XGBoost

LightGBM

H2O

FastText

ONNX

Deployment guides:

Contributing

Have questions or feedback? Post a new github issue or discuss in our Slack channel: join BentoML Slack

Want to help build BentoML? Check out our contributing guide and the development guide.

Releases

BentoML is under active development and is evolving rapidly. Currently it is a Beta release, we may change APIs in future releases.

Read more about the latest features and changes in BentoML from the releases page.

Usage Tracking

BentoML by default collects anonymous usage data using Amplitude. It only collects BentoML library's own actions and parameters, no user or model data will be collected. Here is the code that does it.

This helps BentoML team to understand how the community is using this tool and what to build next. You can easily opt-out of usage tracking by running the following command:

# From terminal:
bentoml config set usage_tracking=false
# From python:
import bentoml
bentoml.config().set('core', 'usage_tracking', 'False')

License

Apache License 2.0

FOSSA Status