Cortex is an open source platform that takes machine learning models—trained with nearly any framework—and turns them into production web APIs in one command.
install • tutorial • docs • examples • we're hiring • email us • chat with us
-
Autoscaling: Cortex automatically scales APIs to handle production workloads.
-
Multi framework: Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.
-
CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure.
-
Rolling updates: Cortex updates deployed APIs without any downtime.
-
Log streaming: Cortex streams logs from deployed models to your CLI.
-
Prediction monitoring: Cortex monitors network metrics and tracks predictions.
-
Minimal configuration: Deployments are defined in a single
cortex.yaml
file.
# predictor.py
model = download_my_model()
def predict(sample, metadata):
return model.predict(sample["text"])
# cortex.yaml
- kind: deployment
name: sentiment
- kind: api
name: classifier
predictor:
path: predictor.py
tracker:
model_type: classification
compute:
gpu: 1
$ cortex deploy
creating classifier (http://***.amazonaws.com/sentiment/classifier)
$ curl http://***.amazonaws.com/sentiment/classifier \
-X POST -H "Content-Type: application/json" \
-d '{"text": "the movie was great!"}'
positive
$ cortex get classifier --watch
status up-to-date available requested last update avg latency
live 1 1 1 8s 123ms
class count
positive 8
negative 4
The CLI sends configuration and code to the cluster every time you run cortex deploy
. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), Flask, TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.
- Sentiment analysis in TensorFlow with BERT
- Image classification in TensorFlow with Inception
- Text generation in PyTorch with DistilGPT2
- Iris classification in XGBoost / ONNX