canonical/seldon-core-operator

Seldon (edge-bundle) failing in charmed kubernetes deployment

misohu opened this issue · 2 comments

After deploying edge bundle to charmed kubernetes on AWS Seldon-core is failing:
image

Running

kubectl deploy kubeflow --channel=latest/edge --trust

Container logs:

❯ kubectl logs -f seldon-controller-manager-54cd8dcfc-prvxz -n kubeflow
Defaulted container "seldon-core" out of: seldon-core, juju-pod-init (init)
{"level":"info","ts":1674829552.0947893,"logger":"setup","msg":"Intializing operator"}
{"level":"info","ts":1674829552.1163418,"logger":"setup","msg":"CRD not found - trying to create"}
{"level":"error","ts":1674829552.2253666,"logger":"setup","msg":"unable to initialise operator","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\t/workspace/main.go:149\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

Attaching logs
dump.log

Sledon only logs:

❯ juju debug-log --replay | grep seldon
controller-0: 14:56:45 INFO juju.worker.caasapplicationprovisioner.runner start "seldon-controller-manager"
controller-0: 15:06:24 INFO juju.worker.caasprovisioner started operator for application "seldon-controller-manager"
application-seldon-controller-manager: 15:06:26 INFO juju.cmd running jujud [2.9.34 90e2f047763059f0b8a57941ae0907346464aee8 gc go1.19]
application-seldon-controller-manager: 15:06:26 DEBUG juju.cmd   args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=seldon-controller-manager", "--debug"}
application-seldon-controller-manager: 15:06:26 DEBUG juju.agent read agent config, format "2.0"
application-seldon-controller-manager: 15:06:26 INFO juju.worker.upgradesteps upgrade steps for 2.9.34 have already been run.
application-seldon-controller-manager: 15:06:26 INFO juju.cmd.jujud caas operator application-seldon-controller-manager start (2.9.34 [gc])
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "clock" manifold worker started at 2023-01-27 14:06:26.196880263 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "agent" manifold worker started at 2023-01-27 14:06:26.197542092 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "upgrade-steps-gate" manifold worker started at 2023-01-27 14:06:26.198022058 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.introspection introspection worker listening on "@jujud-application-seldon-controller-manager"
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "caas-units-manager" manifold worker started at 2023-01-27 14:06:26.1988468 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.introspection stats worker now serving
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.apicaller connecting with old password
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "api-config-watcher" manifold worker started at 2023-01-27 14:06:26.208160367 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "upgrade-steps-flag" manifold worker started at 2023-01-27 14:06:26.210204205 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "migration-fortress" manifold worker started at 2023-01-27 14:06:26.220874831 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.api successfully dialed "wss://172.31.25.210:17070/model/3ebd7f9f-23d7-4c10-82d4-68a99d6006b4/api"
application-seldon-controller-manager: 15:06:26 INFO juju.api connection established to "wss://172.31.25.210:17070/model/3ebd7f9f-23d7-4c10-82d4-68a99d6006b4/api"
application-seldon-controller-manager: 15:06:26 INFO juju.worker.apicaller [3ebd7f] "application-seldon-controller-manager" successfully connected to "172.31.25.210:17070"
application-seldon-controller-manager: 15:06:26 DEBUG juju.api RPC connection died
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "api-caller" manifold worker completed successfully
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.apicaller connecting with old password
application-seldon-controller-manager: 15:06:26 DEBUG juju.api successfully dialed "wss://172.31.25.210:17070/model/3ebd7f9f-23d7-4c10-82d4-68a99d6006b4/api"
application-seldon-controller-manager: 15:06:26 INFO juju.api connection established to "wss://172.31.25.210:17070/model/3ebd7f9f-23d7-4c10-82d4-68a99d6006b4/api"
application-seldon-controller-manager: 15:06:26 INFO juju.worker.apicaller [3ebd7f] "application-seldon-controller-manager" successfully connected to "172.31.25.210:17070"
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "api-caller" manifold worker started at 2023-01-27 14:06:26.255014228 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "caas-units-manager" manifold worker completed successfully
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "caas-units-manager" manifold worker started at 2023-01-27 14:06:26.263643616 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker started at 2023-01-27 14:06:26.265520702 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker completed successfully
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "upgrader" manifold worker started at 2023-01-27 14:06:26.267809453 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "log-sender" manifold worker started at 2023-01-27 14:06:26.267974806 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "migration-minion" manifold worker started at 2023-01-27 14:06:26.268182889 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "migration-inactive-flag" manifold worker started at 2023-01-27 14:06:26.27124827 +0000 UTC
application-seldon-controller-manager: 15:06:26 INFO juju.worker.caasupgrader abort check blocked until version event received
application-seldon-controller-manager: 15:06:26 INFO juju.worker.migrationminion migration phase is now: NONE
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.caasupgrader current agent binary version: 2.9.34
application-seldon-controller-manager: 15:06:26 INFO juju.worker.caasupgrader unblocking abort check
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.logger initial log config: "<root>=DEBUG"
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "proxy-config-updater" manifold worker started at 2023-01-27 14:06:26.284435601 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "charm-dir" manifold worker started at 2023-01-27 14:06:26.284655444 +0000 UTC
application-seldon-controller-manager: 15:06:26 INFO juju.worker.logger logger worker started
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "api-address-updater" manifold worker started at 2023-01-27 14:06:26.284730685 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "logging-config-updater" manifold worker started at 2023-01-27 14:06:26.284764145 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.dependency "hook-retry-strategy" manifold worker started at 2023-01-27 14:06:26.314754686 +0000 UTC
application-seldon-controller-manager: 15:06:26 DEBUG juju.worker.logger reconfiguring logging from "<root>=DEBUG" to "<root>=INFO"
application-seldon-controller-manager: 15:06:26 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
application-seldon-controller-manager: 15:06:26 INFO juju.worker.caasoperator.charm downloading ch:amd64/focal/seldon-core-58 from API server
application-seldon-controller-manager: 15:06:26 INFO juju.downloader downloading from ch:amd64/focal/seldon-core-58
application-seldon-controller-manager: 15:06:26 INFO juju.downloader download complete ("ch:amd64/focal/seldon-core-58")
application-seldon-controller-manager: 15:06:26 INFO juju.downloader download verified ("ch:amd64/focal/seldon-core-58")
application-seldon-controller-manager: 15:06:32 INFO juju.worker.caasoperator operator "seldon-controller-manager" started
application-seldon-controller-manager: 15:06:32 INFO juju.worker.caasoperator.runner start "seldon-controller-manager/0"
application-seldon-controller-manager: 15:06:32 INFO juju.worker.leadership seldon-controller-manager/0 promoted to leadership of seldon-controller-manager
application-seldon-controller-manager: 15:06:32 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-seldon-controller-manager-0
application-seldon-controller-manager: 15:06:32 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 unit "seldon-controller-manager/0" started
application-seldon-controller-manager: 15:06:32 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 resuming charm install
application-seldon-controller-manager: 15:06:32 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.charm downloading ch:amd64/focal/seldon-core-58 from API server
application-seldon-controller-manager: 15:06:32 INFO juju.downloader downloading from ch:amd64/focal/seldon-core-58
application-seldon-controller-manager: 15:06:32 INFO juju.downloader download complete ("ch:amd64/focal/seldon-core-58")
application-seldon-controller-manager: 15:06:32 INFO juju.downloader download verified ("ch:amd64/focal/seldon-core-58")
application-seldon-controller-manager: 15:06:39 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 hooks are retried true
application-seldon-controller-manager: 15:06:39 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 found queued "install" hook
application-seldon-controller-manager: 15:06:40 INFO unit.seldon-controller-manager/0.juju-log Running legacy hooks/install.
application-seldon-controller-manager: 15:06:45 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "install" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:06:45 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 found queued "leader-elected" hook
application-seldon-controller-manager: 15:06:48 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:06:55 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "config-changed" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:06:55 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0 found queued "start" hook
application-seldon-controller-manager: 15:06:55 INFO unit.seldon-controller-manager/0.juju-log Running legacy hooks/start.
application-seldon-controller-manager: 15:06:57 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "start" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:08:12 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "config-changed" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:09:42 INFO juju.worker.caasoperator started pod init on "seldon-controller-manager/0"
application-seldon-controller-manager: 15:11:11 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:16:25 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:22:16 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "update-status" hook (via hook dispatching script: dispatch)
application-seldon-controller-manager: 15:27:56 INFO juju.worker.caasoperator.uniter.seldon-controller-manager/0.operation ran "update-status" hook (via hook dispatching script: dispatch)

1.6/stable is working for charmed kubeflow

Hi @misohu, the bundle you are deploying points to a bundle with outdated versions of the charms. Do you mind re-deploying but instead of using latest/edge can you use 1.6/edge or 1.6/stable and share your results?

This helped :)