Problems bootstrapping the etcd cluster
Closed this issue · 2 comments
Hello again,
I'm now in part 5 of the tutorial and I'm kind of stuck getting the etcd cluster up and running. It seems the etcd install succeeded and the service did start on all 3 controllers, however I cannot list the cluster members:
Also, it seems the controllers are unable to communicate with one another (the logs are identical on all controllers):
Any help would be more than welcome!
Thanks
Hard to tell from here but in general the command for checking the etcd members is
ETCDCTL_API=3 etcdctl member list
645277c31f2e59fe, started, k8s-controller1, https://10.3.0.201:2380, https://10.3.0.201:2379
a81925033e34d269, started, k8s-controller2, https://10.3.0.202:2380, https://10.3.0.202:2379
ecf70543fa3a5935, started, k8s-controller3, https://10.3.0.203:2380, https://10.3.0.203:2379
I would first check network connectivity beginning with a ping
to all etcd member IPs. If that works then the basic VPN connectivity between the nodes should be ok at least.
Next check if etcd
is really listening on port 2379
and 2380
on all etcd
nodes e.g.:
sudo netstat -tlpn | grep -E "23[0-9]{2}"
tcp 0 0 10.3.0.201:2379 0.0.0.0:* LISTEN 21091/etcd
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 21091/etcd
tcp 0 0 10.3.0.201:2380 0.0.0.0:* LISTEN 21091/etcd
If that is also ok I would try to connect to the etcd
peer port 2380
(which is used for server-to-server communication) via telnet
e.g.:
telnet 10.3.0.201 2380
Trying 10.3.0.201...
Connected to 10.3.0.201.
Escape character is '^]'.
Or with netcat
and checking for the exit code (which should be 0
) e.g.:
nc -w 2 10.3.0.201 2380
echo $?
0
If that works then etcd server-to-server communication should be at least possible.
Next problem could be the certificates but I would first check the things mentioned above.
Thank you for your answer. I've been working on this issue for a while now and I got really tired of debugging PeerVPN as there isn't much documentation available to work on. Anyway, I've started over on LXD for now (easier to snapshot / debug) and I'll give PeerVPN a go later on when I get everything else working and I move back to Scaleway cloud instances (the creation of new instances was really buggy all weekend, might be related to their ongoing migration).
The good news is I seem to have a working etcd cluster running now! I did run into some other issues which might be related to the use of a btrfs filesystem instead of ext4. I'll document them even if they might no be worth modifying your code (I'll let you be the judge of that). It might at least help someone else at some point.