kubernetes-retired/kube-aws

upgrading 0.14 to 0.15: etcd migration failed; controller kubelet fails

flah00 opened this issue · 1 comments

TL;DR
By manually editing the etcd.json.tmpl template I was able to work around one problem. But I have not been able to fix the controller related issue.

Etcd

The export-existing-etcd-state.service was failing, with the error

Failed to start Exports Kubernetes Values from a remote Etcd cluster

This was because ETCD_ENDPOINTS was configured to use private host names in /var/run/coreos/etcdadm-environment-migration. To work around this issue, I had to update stack-templates/etcd.json.tmpl

diff --git a/k9s-zoo/stack-templates/etcd.json.tmpl b/k9s-zoo/stack-templates/etcd.json.tmpl
index a34df2d..8c3dabc 100644
--- a/k9s-zoo/stack-templates/etcd.json.tmpl
+++ b/k9s-zoo/stack-templates/etcd.json.tmpl
@@ -411,9 +411,7 @@
               {{ if $.EtcdMigrationEnabled -}}
               "/var/run/coreos/etcdadm-environment-migration": {
                 "content": { "Fn::Join" : [ "", [
-                  "ETCD_ENDPOINTS='",
-                    "{{ $.EtcdMigrationExistingEndpoints }}",
-                  "'\n",
+                  "ETCD_ENDPOINTS='https://PUBLIC_HOST_1:2379,https://PUBLIC_HOST_2:2379,https://PUBLIC_HOST_3:2379'",
                   "AWS_DEFAULT_REGION='",
                     "{{$.Region}}",
                   "'\n",

Controller

After I make it beyond etcd, I'm confronted with a kubelet networking error, on the controllers.

Jul 03 18:55:52 HOST.ec2.internal sh[20858]: F0703 18:55:52.153512   20858 server.go:273] failed to run 
Kubelet: could not init cloud provider "aws": error finding instance i-006bdcc9632d50a6c: "error listing AWS instances: 
\"RequestError: send request failed\\ncaused by: Post https://ec2.us-east-1.amazonaws.com/: dial tcp: lookup ec2.us-east-1.amazonaws.com on [::1]:53: read udp [::1]:50109->[::1]:53: read: connection refused\""
Jul 03 18:55:52 HOST.ec2.internal systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION

The controller issues are all related to the aws-iam-auth plugin
These issues were not present, when I updated the feature on the 0.14.x branch