crewjam/etcd-aws

Only one node starts etcd-aws

Opened this issue · 2 comments

Hi,

Faced with next situation:
After cluster finished to create - only one node runs etcd-aws service.
On other two I see
Failed Units: 1
etcd-aws.service

journalctl -xe
May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: [etcd.service etcd2.service] are inactive
May 06 12:28:46 ip-10-242-131-220.ec2.internal locksmithd[619]: Unlocking old locks failed: [etcd.service etcd2.service] are inactive. Retrying in 5m0s.

And only if I start service by hand( under root by executing systemctl start etcd-aws) etcd-aws service(and docker container) starts to work.

To recap:
Only one node start etcd-awd after CF deployment, 2 others need to perform to start etcd-aws service by hand.

Any suggestions?

I added RestartSec=10 to /etc/systemd/system/etcd-aws.service and problem seems gone.

I noticed similar behavior, restarting early enough (i set it to 30 seconds) indeed seemed to (reliably) fix it. I must've spun up over 30 nodes today without a repeat of this problem.

It's more likely that some kind of dependency is still missing/not started when this service is started. I'll test some more with a dependency on the docker service later.