heyxhh/cortex

Deploy, manage, and scale machine learning models in production

GoApache-2.0

Website • Slack • Docs

Deploy, manage, and scale machine learning models in production

Cortex is a cloud native model serving platform for machine learning engineering teams.

Use cases

Realtime machine learning - build NLP, computer vision, and other APIs and integrate them into any application.
Large-scale inference - scale realtime or batch inference workloads across hundreds or thousands of instances.
Consistent MLOps workflows - create streamlined and reproducible MLOps workflows for any machine learning team.

Deploy

Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client.
Run realtime inference, batch inference, asynchronous inference, and training jobs.
Define preprocessing and postprocessing steps in Python and chain workloads seamlessly.

$ cortex deploy apis.yaml

• creating text-generator (realtime API)
• creating image-classifier (batch API)
• creating video-analyzer (async API)

all APIs are ready!

Manage

Create A/B tests and shadow pipelines with configurable traffic splitting.
Automatically stream logs from every workload to your favorite log management tool.
Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards.

$ cortex get

API                 TYPE        GPUs
text-generator      realtime    32
image-classifier    batch       64
video-analyzer      async       16

Scale

Configure workload and cluster autoscaling to efficiently handle large-scale production workloads.
Create clusters with different types of instances for different types of workloads.
Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances.

$ cortex cluster info

provider: aws
region: us-east-1
instance_types: [c5.xlarge, g4dn.xlarge]
spot_instances: true
min_instances: 10
max_instances: 100