NOS (torch-nos
) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.
- 👩💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- 🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
- 🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
- 🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
- 📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
- ⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
- ☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.
Get started with the full NOS server by installing via pip:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch>=2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]
If you want to simply use a light-weight NOS client and run inference on your local machine (via docker), you can install the client-only package:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos
gRPC API ⚡ | REST API |
from nos.client import Client
client = Client("[::]:50051")
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
width=1024, height=1024, num_images=1) |
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["fox jumped over the moon"],
"width": 1024,
"height": 1024,
"num_images": 1
}
}' |
gRPC API ⚡ | REST API |
from nos.client import Client
client = Client("[::]:50051")
clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(text=["fox jumped over the moon"]) |
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip-vit-base-patch32",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}' |
├── docker # Dockerfile for CPU/GPU servers
├── docs # mkdocs documentation
├── examples # example guides, jupyter notebooks, demos
├── makefiles # makefiles for building/testing
├── nos
│ ├── cli # CLI (hub, system)
│ ├── client # gRPC / REST client
│ ├── common # common utilities
│ ├── executors # runtime executor (i.e. Ray)
│ ├── hub # hub utilies
│ ├── managers # model manager / multiplexer
│ ├── models # model zoo
│ ├── proto # protobuf defs for NOS gRPC service
│ ├── server # server backend (gRPC)
│ └── test # pytest utilities
├── requirements # requirement extras (server, docs, tests)
├── scripts # basic scripts
└── tests # pytests (client, server, benchmark)
- Quickstart
- Models
- Concepts: Architecture Overview, ModelSpec, ModelManager, Runtime Environments
- Demos: Building a Discord Image Generation Bot, Video Search Demo
-
Commodity GPUs
- NVIDIA GPUs (20XX, 30XX, 40XX)
- AMD GPUs (RX 7000)
-
Cloud GPUs
- NVIDIA (H100, A100, A10G, A30G, T4, L4)
- AMD (MI200, MI250)
-
Cloud Service Providers (via SkyPilot)
- AWS, GCP, Azure
- Opinionated Cloud: Lambda Labs, RunPod, etc
-
Cloud ASICs
- AWS Inferentia (Inf1/Inf2)
- Google TPU
- Coming soon! (Habana Gaudi, Tenstorrent)
This project is licensed under the Apache-2.0 License.
NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0
.
We welcome contributions! Please see our contributing guide for more information.
- 💬 Send us an email at support@autonomi.ai or join our Discord for help.
- 📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.
<style> .md-typeset h1, .md-content__button { display: none; } </style>