deis/workflow

Intermittent issues connecting to registry

rhenretta opened this issue · 1 comments

Running deis workflow v2.14.0 on kubernetes v1.6.4 built with kops.

Used CNI options when deploying deis workflow:

  use_cni: true
  registry_proxy_bind_addr: "127.0.0.1:5555"

I tried deploying the example-go chart

$ deis create health-check --no-remote
Creating Application... done, created health-check
If you want to add a git remote for this app later, use `deis git:remote -a health-check`
$ deis pull deis/example-go:latest -a health-check
Creating build... Error: Unknown Error (400): {"detail":"(app::deploy): rpc error: code = 2 desc = Error while pulling image: Get http://127.0.0.1:5555/v1/repositories/health-check/images: dial tcp 127.0.0.1:5555: getsockopt: connection refused"}
$ deis pull deis/example-go:latest -a health-check
Creating build... done

I checked the logs on all the deis-registry-proxy pods. Looks like 1/3 can't see the registry come online:

2017-05-30T18:22:16.691091782Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T18:24:24.946897342Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T18:26:33.330851271Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T18:28:41.586901797Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T18:30:49.971669483Z waiting for the registry (100.71.45.99:80) to come online...

one came online as expected:

2017-05-30T16:07:31.983198358Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T16:07:34.083123317Z waiting for the registry (100.71.45.99:80) to come online...
2017-05-30T16:07:35.27908879Z starting registry-proxy...

and the last shows the output from the successful pull

2017-05-30T18:29:27.249769638Z 127.0.0.1 - - [30/May/2017:18:29:27 +0000] "GET /v2/ HTTP/1.1" 200 2 "-" "docker/1.12.6 go/go1.6.4 git-commit/78d1802 kernel/4.4.65-k8s os/linux arch/amd64 UpstreamClient(docker-py/1.10.6)" "-"
2017-05-30T18:29:27.343427138Z 127.0.0.1 - - [30/May/2017:18:29:27 +0000] "GET /v2/health-check/manifests/v3 HTTP/1.1" 200 1146 "-" "docker/1.12.6 go/go1.6.4 git-commit/78d1802 kernel/4.4.65-k8s os/linux arch/amd64 UpstreamClient(docker-py/1.10.6)" "-"

This issue was moved to teamhephy/workflow#18