[BUG] spark-operator v1beta2-1.4.2-3.5.0 install with helm timeout
Opened this issue · 4 comments
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the Feature request
template.
- ✋ I have searched the open/closed issues and my issue is not listed.
Reproduction Code [Required]
I encounter the problem while using it in Github CI/CD while giving this jobs:
create-cluster:
runs-on: ubuntu-latest
steps:
- name: Checkout current branch (full)
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Create kind cluster
uses: helm/kind-action@v1
with:
config: ./kind/k8s_config/kind-config.yaml
- name: Helm install
run: |
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm search repo spark-operator
helm repo update
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set webhook.enable=true --debug
Steps to reproduce the behavior:
Expected behavior
Successful spark-operator install
Actual behavior
Installation is timeout after 5 min
Terminal Output Screenshot(s)
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm search repo spark-operator
helm repo update
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set webhook.enable=true --debug
shell: /usr/bin/bash -e {0}
"spark-operator" has been added to your repositories
NAME CHART VERSION APP VERSION DESCRIPTION
spark-operator/spark-operator 1.3.0 v1beta[2](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:2)-1.4.2-3.5.0 A Helm chart for Spark on Kubernetes operator
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "spark-operator" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
install.go:218: [debug] Original chart version: ""
install.go:2[3](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:3)5: [debug] CHART PATH: /home/runner/.cache/helm/repository/spark-operator-1.3.0.tgz
client.go:1[4](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:4)2: [debug] creating 1 resource(s)
client.go:142: [debug] creating 1 resource(s)
wait.go:48: [debug] beginning wait for 2 resources with timeout of 1m0s
install.go:20[5](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:5): [debug] Clearing REST mapper cache
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ServiceAccount
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" /v1, Kind=ServiceAccount: serviceaccounts "my-release-spark-operator" not found
wait.go:[6](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:6)6: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ClusterRole
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "my-release-spark-operator" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator" ClusterRoleBinding
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "my-release-spark-operator" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "my-release-spark-operator-webhook-init" Job
client.go:490: [debug] Ignoring delete failure for "my-release-spark-operator-webhook-init" batch/v1, Kind=Job: jobs.batch "my-release-spark-operator-webhook-init" not found
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 5m0s
client.go:142: [debug] creating 1 resource(s)
client.go:[7](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:8)12: [debug] Watching for changes to Job my-release-spark-operator-webhook-init with timeout of 5m0s
client.go:740: [debug] Add/Modify event for my-release-spark-operator-webhook-init: ADDED
client.go:779: [debug] my-release-spark-operator-webhook-init: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:740: [debug] Add/Modify event for my-release-spark-operator-webhook-init: MODIFIED
client.go:779: [debug] my-release-spark-operator-webhook-init: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed pre-install: 1 error occurred:
* timed out waiting for the condition
helm.go:[8](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:9)4: [debug] failed pre-install: 1 error occurred:
* timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:158
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.8.0/command.go:[9](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:10)83
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.8.0/command.go:[10](https://github.com/Jay-boo/InsightHoot/actions/runs/9187701502/job/25265933427#step:5:11)39
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:267
runtime.goexit
runtime/asm_amd64.s:1650
Error: Process completed with exit code 1.
Environment & Versions
- Spark Operator App version: v1beta2-1.4.2-3.5.0
- Helm Chart Version: 1.3.0
- Kubernetes Version:
Client Version: v1.29.3
Kustomize Version: v[5](https://github.com/Jay-boo/InsightHoot/actions/runs/9187931760/job/25266641497#step:4:6).0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
- Apache Spark version: None at this stage
Additional context
Forced to use Chart version 1.2.7 to make it work
Forced to use Chart version 1.2.7 to make it work
Can you please specify here the helm install command?
I have the similar problem (timeout here, mac m2)
helm install spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default --set webhook.enable=true --generate-name --debug
UPD:
seems
helm install eee spark-operator/spark-operator --namespace spark-operator --set sparkJobNamespace=default --set webhook.enable=true --debug --version 1.2.7
Forced to use Chart version 1.2.7 to make it work
Fun fact that this is the only working version (1.2.5 also has timeout)
Do you have any problems with 1.2.7? For example, I don't see driver-pods creating while running spark-pi example, maybe because this is the first k8s touch from my side)
#
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "spark:3.5.0"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"
sparkVersion: "3.5.0"
sparkUIOptions:
serviceLabels:
test-label/v1: 'true'
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.5.0
serviceAccount: spark-operator-spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.5.0
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"