Failed to create Dcluster object
Closed this issue · 3 comments
stuartleeks commented
I've just installed v0.30 and attempted to create a DCluster
using the config/samples
yaml.
Kubernetes version 1.13.10
I get the following error in the operator logs:
2019-10-14T17:55:03.104Z INFO controllers.Dcluster Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.104Z INFO controllers.Dcluster AddFinalizer for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z INFO controllers.Dcluster Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "dcluster", "request": "kubeflow/dcluster-sample"}
2019-10-14T17:55:03.128Z INFO controllers.Dcluster Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z INFO controllers.Dcluster Submit for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z INFO controllers.Dcluster Create cluster dcluster-sample
2019-10-14T17:55:03.128Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Dcluster","namespace":"kubeflow","name":"dcluster-sample","uid":"bf7ba102-eeab-11e9-a0ba-1e18e514b3df","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"8427"}, "reason": "Added", "message": "Object finalizer is added"}
2019-10-14T17:55:10.006Z INFO controllers.Dcluster Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.006Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "dcluster", "request": "kubeflow/dcluster-sample"}
2019-10-14T17:55:10.006Z INFO controllers.Dcluster Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.007Z INFO controllers.Dcluster Refresh for kubeflow/dcluster-sample
2019-10-14T17:55:10.007Z INFO controllers.Dcluster Refresh cluster dcluster-sample
2019-10-14T17:55:10.007Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Dcluster","namespace":"kubeflow","name":"dcluster-sample","uid":"bf7ba102-eeab-11e9-a0ba-1e18e514b3df","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"8443"}, "reason": "Submitted", "message": "Object is submitted"}
2019-10-14T17:55:10.704Z INFO controllers.Dcluster Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.704Z ERROR controller-runtime.controller Reconciler error {"controller": "dcluster", "request": "kubeflow/dcluster-sample", "error": "error when refreshing cluster: unexpected end of JSON input"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0-beta.4/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0-beta.4/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0-beta.4/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88
$ k get dclusters.databricks.microsoft.com NAME AGE CLUSTERID STATE NUMWORKERS
dcluster-sample 2m 1014-175509-erred163
$ k describe dclusters.databricks.microsoft.com dcluster-sample Name: dcluster-sample
Namespace: kubeflow
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"databricks.microsoft.com/v1alpha1","kind":"Dcluster","metadata":{"annotations":{},"name":"dcluster-sample","namespace":"kub...
API Version: databricks.microsoft.com/v1alpha1
Kind: Dcluster
Metadata:
Creation Timestamp: 2019-10-14T17:55:03Z
Finalizers:
dcluster.finalizers.databricks.microsoft.com
Generation: 2
Resource Version: 8443
Self Link: /apis/databricks.microsoft.com/v1alpha1/namespaces/kubeflow/dclusters/dcluster-sample
UID: bf7ba102-eeab-11e9-a0ba-1e18e514b3df
Spec:
Autoscale:
max_workers: 5
min_workers: 2
cluster_name: dcluster-sample
node_type_id: Standard_D3_v2
spark_version: 5.3.x-scala2.11
Status:
cluster_info:
cluster_cores: 0
cluster_id: 1014-175509-erred163
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Added 3m40s dcluster-controller Object finalizer is added
Normal Submitted 3m33s dcluster-controller Object is submitted
stuartleeks commented
Looking into this a bit deeper, I believe that this is caused by a bug in the SDK: https://github.com/xinsnake/databricks-sdk-golang/issues/2
stuartleeks commented
@Azadehkhojandi This seems to still be happening. The cluster gets created but the CRD status has minimal cluster_info
and there is the error in the operator logs. (Thanks to @storey247 for testing!)
stuartleeks commented
Closing this as #107 merges the fix in the SDK