/kuberay

A toolkit to run Ray applications on Kubernetes

Primary LanguageGoApache License 2.0Apache-2.0

KubeRay

Build Status Go Report Card

KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. It offers several key components:

KubeRay core: This is the official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService. These resources are designed to help you run a wide range of workloads with ease.

  • RayCluster: KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.

  • RayJob: With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.

  • RayService: RayService is made up of two parts: a RayCluster and a Ray Serve deployment graph. RayService offers zero-downtime upgrades for RayCluster and high availability.

Community-managed components (optional): Some components are maintained by the KubeRay community.

  • KubeRay APIServer: It provides a layer of simplified configuration for KubeRay resources. The KubeRay API server is used internally by some organizations to back user interfaces for KubeRay resource management.

  • KubeRay Python client: This Python client library provides APIs to handle RayCluster from your Python application.

  • KubeRay CLI: KubeRay CLI provides the ability to manage KubeRay resources through command-line interface.

KubeRay ecosystem

Documentation

You can view detailed documentation and guides at https://ray-project.github.io/kuberay/.

We also recommend checking out the official Ray guides for deploying on Kubernetes at https://docs.ray.io/en/latest/cluster/kubernetes/index.html.

Quick Start

  • Try this end-to-end example!
  • Please choose the version you would like to install. The examples below use the latest stable version v0.5.0.
Version Stable Suggested Kubernetes Version
master N v1.19 - v1.25
v0.5.0 Y v1.19 - v1.25

Use YAML

Make sure your Kubernetes and Kubectl versions are both within the suggested range. Once you have connected to a Kubernetes cluster, run the following commands to deploy the KubeRay Operator.

# case 1: kubectl >= v1.22.0
export KUBERAY_VERSION=v0.5.0
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=${KUBERAY_VERSION}&timeout=90s"

# case 2: kubectl < v1.22.0
# Clone KubeRay repository and checkout to the desired branch e.g. `release-0.5`.
kubectl create -k ray-operator/config/default

To deploy both the KubeRay Operator and the optional KubeRay API Server run the following commands.

# case 1: kubectl >= v1.22.0
export KUBERAY_VERSION=v0.5.0
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=${KUBERAY_VERSION}&timeout=90s"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base?ref=${KUBERAY_VERSION}&timeout=90s"

# case 2: kubectl < v1.22.0
# Clone KubeRay repository and checkout to the desired branch e.g. `release-0.4`.
kubectl create -k manifests/cluster-scope-resources
kubectl apply -k manifests/base

Observe that we must use kubectl create to install cluster-scoped resources. The corresponding kubectl apply command will not work. See KubeRay issue #271.

Use Helm (Helm v3+)

A Helm chart is a collection of files that describe a related set of Kubernetes resources. It can help users to deploy the KubeRay Operator and Ray clusters conveniently. Please read kuberay-operator to deploy the operator and ray-cluster to deploy a configurable Ray cluster. To deploy the optional KubeRay API Server, see kuberay-apiserver.

helm repo add kuberay https://ray-project.github.io/kuberay-helm/

# Install both CRDs and KubeRay operator v0.5.0.
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0

# Check the KubeRay operator Pod in `default` namespace
kubectl get pods
# NAME                                READY   STATUS    RESTARTS   AGE
# kuberay-operator-6fcbb94f64-mbfnr   1/1     Running   0          17s

Development

Please read our CONTRIBUTING guide before making a pull request. Refer to our DEVELOPMENT to build and run tests locally.

Getting involved

Kuberay has an active community of developers. Here’s how to get involved with the Kuberay community:

Join our community: Join Ray community slack (search for Kuberay channel) or use our discussion board to ask questions and get answers.

Security

If you discover a potential security issue in this project, or think you may have discovered a security issue, we ask that you notify KubeRay Security via our Slack Channel. Please do not create a public GitHub issue.

License

This project is licensed under the Apache-2.0 License.