projectcalico/bird

route reflector container with ETCD backend: BGP status hung in start state

Closed this issue · 1 comments

tigera/calico-k8s-cluster has a install-rr script that starts the route reflector container:
https://github.com/tigera/calico-k8s-cluster/blob/master/gce/templates/rr-template.yaml#L68

The latest commit pushed to docker/quay causes the install-rr script hang in start state Active Socket: Connection refused

commit where behavior changed:
Add KDD as a supported backend 595266c

calicoctl node status output:

IPv4 BGP status
+--------------+-----------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE |  SINCE   |              INFO              |
+--------------+-----------+-------+----------+--------------------------------+
| 10.240.0.39  | global    | start | 21:42:08 | Active Socket: Connection      |
|              |           |       |          | refused                        |
| 10.240.0.41  | global    | start | 21:42:08 | Active Socket: Connection      |
|              |           |       |          | refused                        |
+--------------+-----------+-------+----------+--------------------------------+

Bird log in rr node:
2017-07-26_21:48:51.83931 bird: Unable to open configuration file /config/bird.cfg: No such file or directory

confd error:
2017-07-26T21:52:45Z tigera-scale-fd-rr1.c.unique-caldron-775.internal /confd[21]: ERROR 100: Key not found (/calico/bgp/v1/rr_v4) [27]

work around (manually executed per-rr):
etcdctl set calico/bgp/v1/rr_v4/10.240.0.71 '{"ip": "10.240.0.71", "cluster_id": "1.0.0.0"}'
etcdctl set calico/bgp/v1/rr_v4/10.240.0.72 '{"ip": "10.240.0.72", "cluster_id": "1.0.0.0"}'

Not sure if this should just be a change in the install-rr script, documentation, or the image itself.

So temporary soln could also be to use the previous version of the RR in the manifests.

Probably the nicest way to solve this is to remove the step of needing to specify the cluster ID. We should just auto-add the entry into etcd and accept an optional CLUSTER_ID environment. Easiest way to do that might just be to add etcdctl or curl into the container and use that to add the entry.