Beta9 is an open-source platform for running scalable serverless GPU workloads across cloud providers.
Features:
- Scale out workloads to thousands of GPU (or CPU) containers
- Ultrafast cold-start for custom ML models
- Automatic scale to zero so you pay for only what you use
- Flexible distributed storage for storing models and function outputs
- Distribute workloads across multiple cloud providers
- Easily deploy task queues and functions using simple Python abstractions
We use beta9 internally at Beam to run AI applications for users at scale.
from beta9 import Image, endpoint
@endpoint(
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_packages=[
"vllm==0.4.1",
], # These dependencies will be installed in your remote container
),
)
def predict():
from vllm import LLM
prompts = ["The future of AI is"]
llm = LLM(model="facebook/opt-125m")
output = llm.generate(prompts)[0]
return {"prediction": output.outputs[0].text}
$ beta9 deploy app.py:predict --name llm-inference
=> Building image
=> Using cached image
=> Deploying endpoint
=> Deployed 🎉
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/llm-inference/v1' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{}'
from beta9 import function
# This decorator allows you to parallelize this function
# across multiple remote containers
@function(cpu=1, memory=128)
def square(i: int):
return i**2
def main():
numbers = list(range(100))
squared = []
# Run a remote container for every item in list
for result in square.map(numbers):
squared.append(result)
from beta9 import task_queue, Image
@task_queue(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
keep_warm_seconds=1000,
)
def multiply(x):
result = x * 2
return {"result": result}
# Manually insert task into the queue
multiply.put(x=10)
Beta9 is designed for launching remote serverless containers quickly. There are a few things that make this possible:
- A custom, lazy loading image format (CLIP) backed by S3/FUSE
- A fast, redis-based container scheduling engine
- Content-addressed storage for caching images and files
- A custom runc container runtime
The fastest and most reliable way to get started is by signing up for our managed service, Beam Cloud. Your first 10 hours of usage are free, and afterwards you pay based on usage.
You can run Beta9 locally, or in an existing Kubernetes cluster using our Helm chart.
k3d is used for local development. You'll need Docker and Make to get started.
To use our fully automated setup, run the setup
make target.
Note
This will overwrite some of the tools you may already have installed. Review the setup.sh to learn more.
make setup
The SDK is written in Python. You'll need Python 3.8 or higher. Use the setup-sdk
make target to get started.
Note
This will install the Poetry package manager.
make setup-sdk
After you've setup the server and SDK, check out the SDK readme here.
We welcome contributions, big or small! These are the most helpful things for us:
- Rank features in our roadmap
- Open a PR
- Submit a feature request or bug report
If you need support, you can reach out through any of these channels:
- Slack (Chat live with maintainers and community members)
- GitHub issues (Bug reports, feature requests, and anything roadmap related)
- Twitter (Updates on releases and stuff)