/cortex

Deploy machine learning models to production

Primary LanguageGoApache License 2.0Apache-2.0


installdocumentationexamplessupport

Deploy machine learning models to production

Cortex is an open source platform for deploying, managing, and scaling machine learning in production.


Model serving infrastructure

  • Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs
  • Ensures high availability with availability zones and automated instance restarts
  • Scales to handle production workloads with request-based autoscaling
  • Runs inference on spot instances with on-demand backups
  • Manages traffic splitting for A/B testing

Configure your cluster:

# cluster.yaml

region: us-east-1
availability_zones: [us-east-1a, us-east-1b]
api_gateway: public
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true

Spin up your cluster on your AWS account:

$ cortex cluster up --config cluster.yaml

○ configuring autoscaling ✓
○ configuring networking ✓
○ configuring logging ✓
○ configuring metrics dashboard ✓

cortex is ready!

Reproducible model deployments

  • Implement request handling in Python
  • Customize compute, autoscaling, and networking for each API
  • Package dependencies, code, and configuration for reproducible deployments
  • Test locally before deploying to your cluster

Implement a predictor:

# predictor.py

from transformers import pipeline

class PythonPredictor:
  def __init__(self, config):
    self.model = pipeline(task="text-generation")

  def predict(self, payload):
    return self.model(payload["text"])[0]

Configure an API:

# cortex.yaml

name: text-generator
kind: RealtimeAPI
predictor:
  path: predictor.py
compute:
  gpu: 1
  mem: 4Gi
autoscaling:
  min_replicas: 1
  max_replicas: 10
networking:
  api_gateway: public

Deploy to production:

$ cortex deploy cortex.yaml

creating https://example.com/text-generator

$ curl https://example.com/text-generator \
    -X POST -H "Content-Type: application/json" \
    -d '{"text": "deploy machine learning models to"}'

"deploy machine learning models to production"

API management

  • Monitor API performance
  • Aggregate and stream logs
  • Customize prediction tracking
  • Update APIs without downtime

Manage your APIs:

$ cortex get

realtime api       status     replicas   last update   latency   requests

text-generator     live       34         9h            247ms     71828
object-detector    live       13         15h           23ms      828459


batch api          running jobs   last update

image-classifier   5              10h

Get started

$ pip install cortex

See the installation guide for next steps.