/ray-demo

Ray cluster with examples running on Kubernetes (k3d)

Primary LanguagePython

ray demo

Ray cluster with examples running on Kubernetes (k3d).

Prerequisites

  • python 3
  • k3d to create a k3s kubes cluster (optional)
  • helm to install the kuberay operator (optional)

Create virtualenv:

make install

If needed, create a k3s kubes cluster using k3d (optional):

make cluster

Now set your kube context before running further commands.

Getting started

Install the ray cluster into kubes:

  • kuberay operator: make kuberay raycluster (recommended)
  • stock python operator: make ray-kube-install (deprecated)

Ingress

For k3d, run make k3d-ingress, else run make forward:

  • The Ray client server will be exposed on localhost port 10001.
  • The Ray dashboard can be accessed on http://localhost:8265/
  • The Ray GCS server will be exposed on localhost port 6379.

Usage

Ping head node (once pod is ready):

make ping

Run example application

python raydemo/cluster_info.py

Run shell on head pod:

make shell

Kuberay

Kuberay consists of:

  • helm-chart/ - helm charts for the apiserver, operator and a ray-cluster (recommended)
  • ray-operator/config/ - kustomize templates, which seem more up to date than the helm charts. Includes
    • crd: the rayclusters, rayjobs, and rayservices CRDs
    • default: crd, rbac, manager, and ray-system namespace
    • manager: kuberay operator deployment and serivce
    • prometheus
    • rbac: roles, service accounts etc.
  • ray-operator/config/samples: raycluster examples
  • manifests/ kutomize quickstart manifests for installing the default template + apiserver

make kuberay installs the kuberay-operator helm chart which creates:

CRDs:

  • customresourcedefinition.apiextensions.k8s.io/rayclusters.ray.io created
  • customresourcedefinition.apiextensions.k8s.io/rayjobs.ray.io created
  • customresourcedefinition.apiextensions.k8s.io/rayservices.ray.io created

And the following resources in the default namespace:

  • ServiceAccount kuberay-operator
  • ClusterRole rayjob-editor-role
  • ClusterRole rayjob-viewer-role
  • ClusterRole rayservice-editor-role
  • ClusterRole rayservice-viewer-role
  • ClusterRole kuberay-operator
  • ClusterRoleBinding kuberay-operator
  • Role kuberay-operator
  • RoleBinding kuberay-operator
  • Service kuberay-operator
  • Deployment kuberay-operator

make raycluster creates the following in the default namespace:

  • raycluster-kuberay-head-svc service
  • ray head pod with limits of 1 CPU and 2Gi memory
  • ray worker pod with limits of 1 CPU and 2Gi memory

make delete removes the ray cluster

For more info see the ray-operator readme.

Examples

See examples in raydemo.

Most examples will start a local ray instance. To use the cluster instead:

export RAY_ADDRESS=ray://127.0.0.1:10001

Autoscaler notes

See autoscaler.md

References

Sizing

See [Feature][Docs][Discussion] Provider consistent guidance on resource Request and Limits #744

Known issues