[Bug] Issues with RayCluster CRD and kubectl apply
DmitriGekhtman opened this issue ยท 11 comments
Search before asking
- I searched the issues and found no similar issues.
KubeRay Component
Others
What happened + What you expected to happen
kubectl apply -k manifests/cluster-scope-resources
yields the error
The CustomResourceDefinition "rayclusters.ray.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes.
Reason:
After re-generating the KubeRay CRD in #268, some pod template fields from recent versions of K8s were generated. Now the CRD is too big to fit in the metadata.lastAppliedConfiguration field used by kubectl apply
.
The solution I'd propose is to move the CRD out of the kustomization file and advise users to kubectl create
the CRD before installing the rest of the cluster-scoped resources.
Reproduction script
See above.
Anything else
After running kubectl apply -k
, I tried to kubectl delete -k
so that I could subsequently kubectl create -k
.
Unfortunately, my ray-system
namespace is hanging in a terminating state!
edit: My ray-system namespace is hanging simply because cluster is 100% borked.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
In case this helps other people using ArgoCD to deploy KubeRay, we solved this issue using a Kustomization and patching the RayCluster CRD with the annotation argocd.argoproj.io/sync-options: Replace=true
to make ArgoCD use kubectl replace
instead of kubectl apply
when syncing this particular resource:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://github.com/ray-project/kuberay/manifests/cluster-scope-resources/?ref=master
- https://github.com/ray-project/kuberay/manifests/base/?ref=master
patchesStrategicMerge:
# CRD rayclusters.ray.io manifest is too big to fit in the
# annotation `kubectl.kubernetes.io/last-applied-configuration`
# added by `kubectl apply` used by ArgoCD, and so it fails
# https://github.com/ray-project/kuberay/issues/271
# Annotate this CRD to make ArgoCD use `kubectl replace` and avoid the error when syncing it
- |-
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: rayclusters.ray.io
annotations:
argocd.argoproj.io/sync-options: Replace=true
I have the same issue.
We'll start by replacing "apply" in the docs with "create". Then we'll look into shrinking the CRD.
It seems this bug comes up from time to time in various K8s projects...
Also to extend this, we should have status
and restarts
about running clusters ?
$ kubectl get rayclusters
NAME AGE
raycluster-complete 7m48s
it used to be
$ kubectl -n ray get rayclusters
NAME STATUS RESTARTS AGE
example-cluster Running 0 53s
Status could make sense -- it would simply indicate the status of the head pod.
Restarts are a bit flimsier as a notion because we don't quite have a coherent notion of what constitutes a restart -- I guess that would mean the number of head container restarts + the number of head pod replacements.
We could potentially take a look at what the K8s deployment controller does.
Try using kubectl apply --server-side
For the Argo CD users, maybe we can add some instructions into the document?
Just like I did for the Flink operator project
https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd
For the Argo CD users, maybe we can add some instructions into the document? Just like I did for the Flink operator project https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd
@haoxins
That sounds good.
If you have a working set-up with Argo CD / Helm / KubeRay, feel free to open a PR adding the relevant info to the README!
https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md
For the Argo CD users, maybe we can add some instructions into the document? Just like I did for the Flink operator project https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd
@haoxins That sounds good. If you have a working set-up with Argo CD / Helm / KubeRay, feel free to open a PR adding the relevant info to the README! https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md
We could update the docs to mention that kubectl apply --server-side
works.
I think for the moment, the only actionable item is the documentation item described in the last comment.
Going to remove the 0.4.0 milestone label from this issue because docs are not currently versioned.