Simple installation of Hephy wouldn't work. Client on Mac OS X and Cluster on AWS

Question

Simple installation of Hephy wouldn't work. Client on Mac OS X and Cluster on AWS

jofell opened this issue 5 years ago · 13 comments

I have been getting these error messages from a simple install of hephy. Would really appreciate if anyone could let me know how I came into receieving these errors (I was assuming these should be addressed by the installer itself?)

Error: validation failed: [secrets "minio-user" not found, secrets "deis-router-dhparam" not found, secrets "objectstorage-keyfile" not found, configmaps "dockerbuilder-config" not found, configmaps "slugbuilder-config" not found, configmaps "slugrunner-config" not found, serviceaccounts "deis-builder" not found, serviceaccounts "deis-controller" not found, serviceaccounts "deis-logger-fluentd" not found, serviceaccounts "deis-logger" not found, serviceaccounts "deis-monitor-telegraf" not found, serviceaccounts "deis-nsqd" not found, serviceaccounts "deis-registry" not found, serviceaccounts "deis-router" not found, serviceaccounts "deis-workflow-manager" not found, services "deis-builder" not found, services "deis-controller" not found, services "deis-logger" not found, services "deis-monitor-grafana" not found, services "deis-monitor-influxapi" not found, services "deis-monitor-influxui" not found, services "deis-nsqd" not found, services "deis-logger-redis" not found, services "deis-registry" not found, services "deis-router" not found, services "deis-workflow-manager" not found, daemonsets.extensions "deis-logger-fluentd" not found, daemonsets.extensions "deis-monitor-telegraf" not found, daemonsets.extensions "deis-registry-proxy" not found, deployments.extensions "deis-builder" not found, deployments.extensions "deis-controller" not found, deployments.extensions "deis-logger" not found, deployments.extensions "deis-monitor-grafana" not found, deployments.extensions "deis-monitor-influxdb" not found, deployments.extensions "deis-nsqd" not found, deployments.extensions "deis-logger-redis" not found, deployments.extensions "deis-registry" not found, deployments.extensions "deis-router" not found, deployments.extensions "deis-workflow-manager" not found]

Answer 1 · 2020-03-27T21:55:20.000Z

What version of helm are you using to install hephy workflow? Any ideas @kingdonb ?

Answer 2 · 2020-03-28T09:57:09.000Z

Kubernetes v 1.6.13
Helm 2.6.14
Kubectl 1.6.4
kops 1.16.0
AWS with image kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2018-08-17

Hephy 2.21.4

Would be great if you could teach me how to install (I'm kind of a newbie)

My guess is my tiller setup. But I'm not really sure what is the correct tiller configuration.

Answer 3 · 2020-03-28T13:52:35.000Z

That's a pretty old k8s and helm version, which probably means if anything you should be in good shape for Workflow, and I've had enough issues with upgrades lately that I won't suggest a drastic all-at-once upgrade route...

The recommended version of Helm right now for Hephy v2.21.4 and lower is still 2, although I have had good luck with Helm 3. Kubernetes kubelet and kubectl must remain below v1.16.x (supported version is <= 1.15.x) for now, as we have not finalized our apps/v1 updates or extensions/v1beta1 API deprecations support required for newer versions of Kubernetes.

We have some issues with our documentation right now, sorry this is not posted anywhere more prominently. We should be able to support newer versions of Kubernetes soon. But if you start now, a test cluster with say https://github.com/weaveworks/wks-quickstart-firekube will come in with the right versions, v1.14.1 default, this is a great environment for testing without requiring a heavy investment in cloud resources, you can do a multi-node cluster as a trial all from docker containers on a Macbook pro, or KVM machines under a single node if your host OS is Linux.

Any Kubernetes with the right version requirements should work out of the box with the directions on https://web.teamhephy.com – legacy docs are available at https://docs.teamhephy.com which go into more detail about best practices and configuration for production. Glad to have you join us @jofell and I hope @felixbuenemann was able to help you this morning in our Slack community.

Is it too corny to celebrate I just happened to notice this:

...looks like you're our 100th customer? 🤣 🚀 We welcome kind of newbies and anyone else who works with Workflow, and if your experience is broken or positive we hope you will tell us more about it either way as time goes on, so we can support our community better.

Answer 4 · 2020-03-28T15:03:12.000Z

Btw. I'm successfully running on Kubernetes 1.17 with the following Kube-Controller flags:

--runtime-config=extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true,extensions/v1beta1/replicasets=true,extensions/v1beta1/networkpolicies=true

This effectively re-enables the deprecated extensions on 1.16 and 1.17.
I have not verified if that still works on 1.18.

Answer 5 · 2020-03-28T15:08:30.000Z

OK, don't celebrate yet, we have run down this issue and the finding is that Helm 2.16.x has an issue which prevents Workflow from being installed on even the older/correct versions of kubelet for Workflow v2.21.4,

[kube] 2020/03/28 14:43:34 get relation pod of object: deis/Service/deis-router
[kube] 2020/03/28 14:43:34 get relation pod of object: deis/DaemonSet/deis-registry-proxy
[kube] 2020/03/28 14:43:34 get relation pod of object: deis/DaemonSet/deis-logger-fluentd
[kube] 2020/03/28 14:43:34 get relation pod of object: deis/DaemonSet/deis-monitor-telegraf
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14e314f]

goroutine 832 [running]:
k8s.io/helm/pkg/kube.getSelectorFromObject(0x1c01c40, 0xc00087a000, 0xc00087a000, 0x0)
	/go/src/k8s.io/helm/pkg/kube/client.go:924 +0x26f
k8s.io/helm/pkg/kube.(*Client).getSelectRelationPod(0xc00030a3a0, 0xc000238e00, 0xc000af8030, 0xc000b07770, 0x18, 0xc0000ea648)
	/go/src/k8s.io/helm/pkg/kube/client.go:1104 +0x195
k8s.io/helm/pkg/kube.(*Client).Get.func2(0xc000238e00, 0x0, 0x0)
	/go/src/k8s.io/helm/pkg/kube/client.go:366 +0xd1
k8s.io/helm/pkg/kube.batchPerform.func1(0xc00008f0e0, 0xc00079aea0, 0xc0003e0e10, 0xc000238e00)
	/go/src/k8s.io/helm/pkg/kube/client.go:752 +0x30
created by k8s.io/helm/pkg/kube.batchPerform
	/go/src/k8s.io/helm/pkg/kube/client.go:751 +0xb8

The latest version of Helm as of today is v2.16.5 and if your tiller is running that version, or perhaps any 2.16 version, it seems like you will get this error if you tail the tiller logs, or if you are just running helm client interactively it looks like this:

$ helm upgrade --install hephy hephy/workflow --namespace deis --set global.use_rbac=true
Release "hephy" has been upgraded.
Error: transport is closing

We backed down to a Helm version v2.15.2 and don't have this problem anymore, everything seems to come up ok, although the Helm 2.x series is probably not long for this world so that's not too helpful, I'm going to try to confirm that I am able to use Helm 3 to install Hephy, since I have not had issues with the latest Helm versions in the past. I have heard a lot of people saying that you need to keep around Helm 2 and Helm 3 both these days, I am not sure that's true.

Answer 6 · 2020-03-28T15:17:10.000Z

It's definitely possible that all of these things are related by a single thread "issue with that specific combination of versions"

Thanks for letting us know about the workaround for enabling the deprecated versions, I can try this out on my 1.18 cluster next time.

Answer 7 · 2020-03-28T15:52:54.000Z

All indications from my own testing point that Helm v3.1.2 (the latest version as of this writing) works fine with Hephy Workflow v2.21.4 (also the latest version as of today) on a Kubernetes cluster running v1.15.11. I could not say the same about Helm v2.16.4 and v2.16.5 which both seemed to bomb out in different ways when installing Workflow today.

Let's leave this issue report open until we can offer a better answer about what's going on with Helm v2.16.x, but I'm not too worried about it, as it doesn't seem to be an issue in the latest version of the modern-era Helm series releases, without tiller.

More urgent right now that we are able to support a smooth transition to Kubelet control plane of v1.16.x and up, which is coming soon.

Answer 8 · 2020-03-29T13:44:31.000Z

@jofell Since you figured it out, I think this issue can be closed. Maybe add a closing comment what the problem / solution was in case others run into the same problem.

Answer 9 · 2020-03-29T20:36:55.000Z

Hi guys, great work and support. Really appreciate all the help.

Basically the kubernetes version sweet spot is using 1.15 and a working Helm version for it (2.14). But I'm happy you guys saw there's an issue here as well.

Hope to get help / help the project more in the future!

Answer 10 · 2020-03-29T22:51:28.000Z

@kingdonb @felixbuenemann Thank you for helping with this issue ! @jofell let us know what else we can help with in the Slack channel. 👍

Answer 11 · 2020-09-25T16:15:15.000Z

@felixbuenemann Have you verified if this works with 1.18? From my experience it doesn't look like it does...

Answer 12 · 2020-09-25T16:17:46.000Z

@dmcnaught No I haven't, I'm still on 1.17.

Answer 13 · 2020-09-25T16:25:37.000Z

thanks @felixbuenemann