Allow 'etcdctl' and 'fleetctl' to work on the worker nodes according to CoreOS Docs

Question

Allow 'etcdctl' and 'fleetctl' to work on the worker nodes according to CoreOS Docs

grempe opened this issue 10 years ago · 4 comments

As it is now, etcdctl and fleetctl to not work properly on the worker nodes since they don't know how to find the master node's etcd. Following instructions in the CoreOS docs I was able to get this to work on the worker nodes with the addition of a few environment variables. Perhaps you can bake this into your (very helpful) tool for everyone?

The documentation for this was found in the 'Easy Development/Testing Cluster' section of this page:
https://coreos.com/docs/cluster-management/setup/cluster-architectures/

Based on that here are the env vars I manually added on each worker node to get things to (so far) work for me:

export ETCDCTL_PEERS="http://172.17.15.101:4001"
export FLEETCTL_ENDPOINT=unix:///var/run/fleet.sock
export FLEETCTL_EXPERIMENTAL_API=true

After this I am able to run any fleetctl or etcdctl commands on the nodes without seeing errors like:

core@k8snode-01 /run/systemd/system $ fleetctl list-machines
2015/03/15 22:50:33 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/03/15 22:50:33 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms
2015/03/15 22:50:33 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/03/15 22:50:33 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms
...

For a more permanent solution I also followed the directions in that documentation to add the 'write_files' section to automatically add this config to every worker machine. Here is a working gist (see the end of the file for the changes):

https://gist.github.com/grempe/92704b8bdecf19530607

Answer 1 · 2015-03-16T09:56:39.000Z

@grempe sure we can include that etcdctl and fleetctl from workers can connect to control peer.
I have set that in my development and production clusters but which are not k8s based yet :-)
But the changed will not make any harm to have set it for k8s clusters.

And yes this one https://coreos.com/docs/cluster-management/setup/cluster-architectures/ must be read before setting up any CoreOS cluster :-) as I see many bad examples setting CoreOS clusters where all cluster peers are part of etcd cluster :)

Answer 2 · 2015-03-16T11:24:33.000Z

PR merged and released

Answer 3 · 2015-03-16T16:06:38.000Z

Agreed, I am relatively new to CoreOS and that document was very helpful. Initially I was thinking that there was an etcd cluster between your three VM's and I was trying to get that to work (and fun with discovery URL's etc). That document helped me understand the no-cluster approach you were taking in this tool (and how to get everything to work as expected). The no-cluster approach makes a lot of sense for a dev env where you don't want to be hunting down discovery URL's every time you re-launch.

Good learning experience for me troubleshooting it.

Cheers.

Answer 4 · 2015-03-16T16:27:32.000Z

I got bitten badly setting up many CoreOS clusters for production and having problems then.
When I moved everything to 3x control/etcd based cluster (GCE g1-small instances) + many workers, there I have grouped servers by required work, everything became so easy to manage and upgrade the cluster.

E.g even for my development I use just one control/etcd node (GCE g1-small) and 5 workers and it works fantastic too.