deis/registry

cannot upload docker container to registry

DavidSie opened this issue ยท 61 comments

When I build an up with buildpack it works, but when I want to build container I cannot upload it to the registry

 kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4+coreos.0", GitCommit:"be9bf3e842a90537e48361aded2872e389e902e7", GitTreeState:"clean", BuildDate:"2016-08-02T00:54:53Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
deis version
v2.4.0
     git push deis master
Counting objects: 589, done.
Compressing objects: 100% (416/416), done.
Writing objects: 100% (589/589), 2.47 MiB, done.
Total 589 (delta 46), reused 581 (delta 42)
Starting build... but first, coffee!
Step 1 : FROM ruby:2.0.0-p576
---> a137b6df82e8
Step 2 : COPY . /app
---> Using cache
---> a7107ea0f79a
Step 3 : WORKDIR /app
---> Using cache
---> ba2d0c3222ec
Step 4 : EXPOSE 3000
---> Using cache
---> 18f7fb188ed3
Step 5 : CMD while true; do echo hello world; sleep 1; done
---> Using cache
---> 4e22b0487484
Successfully built 4e22b0487484
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.remote: getsockopt: connection refused"}

I know that there are environmental variables to point this address:

     Environment Variables:
      DEIS_REGISTRY_SERVICE_HOST:   localhost
      DEIS_REGISTRY_SERVICE_PORT:   5555

but I don't understand why, since none of the pods, and none of the services is listening on 5555

services

kubectl get services --namespace=deis
NAME                     CLUSTER-IP   EXTERNAL-IP   PORT(S)                            AGE
deis-builder             10.3.0.233   <none>        2222/TCP                           1d
deis-controller          10.3.0.23    <none>        80/TCP                             1d
deis-database            10.3.0.253   <none>        5432/TCP                           1d
deis-logger              10.3.0.221   <none>        80/TCP                             1d
deis-logger-redis        10.3.0.148   <none>        6379/TCP                           1d
deis-minio               10.3.0.232   <none>        9000/TCP                           1d
deis-monitor-grafana     10.3.0.113   <none>        80/TCP                             1d
deis-monitor-influxapi   10.3.0.234   <none>        80/TCP                             1d
deis-monitor-influxui    10.3.0.141   <none>        80/TCP                             1d
deis-nsqd                10.3.0.82    <none>        4151/TCP,4150/TCP                  1d
deis-registry            10.3.0.188   <none>        80/TCP                             1d
deis-router              10.3.0.133   <pending>     80/TCP,443/TCP,2222/TCP,9090/TCP   1d
deis-workflow-manager    10.3.0.34    <none>        80/TCP                             1d

pods


kubectl describe  pods deis-registry-3758253254-3gtjo   --namespace=deis 
Name:       deis-registry-3758253254-3gtjo
Namespace:  deis
Node:       10.63.11.75/10.63.11.75
Start Time: Mon, 22 Aug 2016 10:36:12 +0000
Labels:     app=deis-registry
        pod-template-hash=3758253254
Status:     Running
IP:     10.2.12.12
Controllers:    ReplicaSet/deis-registry-3758253254
Containers:
  deis-registry:
    Container ID:   docker://78d6d569eefac3766e4b921f21b7847d36866a266ae76424d7d6e572bb2f5979
    Image:      quay.io/deis/registry:v2.2.0
    Image ID:       docker://sha256:0eb83b180d1aa993fcdd715e4b919b4867051d4f35a813a56eec04ae0705d3d1
    Port:       5000/TCP
    State:      Running
      Started:      Mon, 22 Aug 2016 10:43:05 +0000
    Ready:      True
    Restart Count:  0
    Liveness:       http-get http://:5000/v2/ delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:5000/v2/ delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      REGISTRY_STORAGE_DELETE_ENABLED:  true
      REGISTRY_LOG_LEVEL:       info
      REGISTRY_STORAGE:         minio
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  registry-storage:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  registry-creds:
    Type:   Secret (a volume populated by a Secret)
    SecretName: objectstorage-keyfile
  deis-registry-token-inyyj:
    Type:   Secret (a volume populated by a Secret)
    SecretName: deis-registry-token-inyyj
QoS Tier:   BestEffort
No events.


kubectl describe  pods deis-registry-proxy-cpu68    --namespace=deis 
Name:       deis-registry-proxy-cpu68
Namespace:  deis
Node:       10.63.11.76/10.63.11.76
Start Time: Mon, 22 Aug 2016 10:36:31 +0000
Labels:     app=deis-registry-proxy
        heritage=deis
Status:     Running
IP:     10.2.63.4
Controllers:    DaemonSet/deis-registry-proxy
Containers:
  deis-registry-proxy:
    Container ID:   docker://dc29ab400a06ae5dc1407c7f1fb0880d4257720170eded6a7f8cde5431fa9570
    Image:      quay.io/deis/registry-proxy:v1.0.0
    Image ID:       docker://sha256:fde297ec95aa244e5be48f438de39a13dae16a1593b3792d8c10cd1d7011f8d1
    Port:       80/TCP
    Limits:
      cpu:  100m
      memory:   50Mi
    Requests:
      cpu:      100m
      memory:       50Mi
    State:      Running
      Started:      Mon, 22 Aug 2016 10:38:32 +0000
    Ready:      True
    Restart Count:  0
    Environment Variables:
      REGISTRY_HOST:    $(DEIS_REGISTRY_SERVICE_HOST)
      REGISTRY_PORT:    $(DEIS_REGISTRY_SERVICE_PORT)
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  default-token-tk993:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-tk993
QoS Tier:   Guaranteed
No events.


From the pod list it looks like the registry-proxy component is missing, which is what proxies requests to the registry. Can you confirm with kubectl --namespace=deis get daemonsets?

there are registry proxies. I attached one above but there are 3 the same( I using 1 master + 2 minions).

kubectl --namespace=deis get daemonsets
NAME                    DESIRED   CURRENT   NODE-SELECTOR   AGE
deis-logger-fluentd     3         3         <none>          1d
deis-monitor-telegraf   3         3         <none>          1d
deis-registry-proxy     3         3         <none>          1d

Okay, so if you do indeed have registry proxies then you're probably hitting the same issue as #62, since your app relies on the ruby image which is relatively large. I would take a look into that issue and see if you find similar behaviour.

According to docker hub https://hub.docker.com/r/library/ruby/tags/ it's only 313 MB, I would say that's average.
Are you sure that this address make sense: localhost:5555, since deis-registry 10.3.0.188 <none> 80/TCP and deis-registry-3758253254-3gtjo pod is listening on port 5000 ?

Yes, that address is correct. The request goes through the registry-proxy, which (as the name suggests) proxies the request to the real registry. It's a workaround for the --insecure-registry flag. See https://github.com/deis/registry-proxy#about

Coming back to the original problem, I'd inspect both your registry and minio to ensure that there are no problems with either backend. From reports it seems like slightly larger than normal images built via Dockerfile (>100MB) seem to be causing these issues.

this is not a big container size issue (alpine is 2MB: https://hub.docker.com/r/library/alpine/tags/ ) :

 git push deis master
Counting objects: 48, done.
Compressing objects: 100% (47/47), done.
Writing objects: 100% (48/48), 6.35 KiB, done.
Total 48 (delta 14), reused 0 (delta 0)
Starting build... but first, coffee!
...
Step 1 : FROM alpine
---> 4e38e38c8ce0
Step 2 : ENV GOPATH /go
---> Using cache
---> bd4d962b7a6e
Step 3 : ENV GOROOT /usr/local/go
---> Using cache
---> 346b304d9d9d
Step 4 : ENV PATH $PATH:/usr/local/go/bin:/go/bin
---> Using cache
---> bfd14db2b7e7
Step 5 : EXPOSE 80
---> Using cache
---> a019f2dadbcc
Step 6 : ENTRYPOINT while true; do echo hello world; sleep 1; done
---> Using cache
---> d500b7d348cb
Successfully built d500b7d348cb
Pushing to registry
{"errorDetail":{"message":"Put http://localhost:5555/v1/repositories/gaslit-gladness/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"},"error":"Put http://localhost:5555/v1/repositories/gaslit-gladnessremote: tcp 127.0.0.1:5555: getsockopt: connection refused"}

remote: 2016/08/25 07:18:46 Error running git receive hook [Build pod exited with code 1, stopping build.]
To ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis-builder.10.63.11.83.nip.io:2222/gaslit-gladness.git'

Which container should listen on port 5555 ?

(This is a different cluster but from the same script)

Which container should listen on port 5555 ?

The registry-proxy listens on port 5555.

Can you please provide the following information so we can try to reproduce this?

  • kubectl version
  • how you provisioned your kubernetes cluster

I recall that there is internal networking issues when using CoreOS with calico: deis/workflow#442

from inside the container

root@deis-registry-proxy-jzf3h:/# telnet localhost 5555
Trying ::1...
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

root@deis-registry-proxy-jzf3h:/# netstat  -lntpu 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1/nginx: master pro

kubectl version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4", GitCommit:"dd6b458ef8dbf24aff55795baa68f83383c9b3a9", GitTreeState:"clean", BuildDate:"2016-08-01T16:45:16Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.4+coreos.0", GitCommit:"be9bf3e842a90537e48361aded2872e389e902e7", GitTreeState:"clean", BuildDate:"2016-08-02T00:54:53Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

to provision kubernetes cluster I used this tutorial: https://coreos.com/kubernetes/docs/latest/getting-started.html

The fun situation about not being able to connect to localhost:5555 within the container is to be expected. We actually mount the host's docker socket, so any command we perform it assumes the host's network. Therefore, localhost:5555 on the host belongs to registry-proxy.

When you provisioned kubernetes, where did you deploy your cluster? AWS, GKE, Vagrant?

I did it on Openstack

We had exact same problem, with:

coreos-kubernetes (from github repo #1876aac with kubernetes 1.3.4)
deis 2.4.0
vagrant 1.8.5

To create our Kubernetes cluster we followed the tutorial here: https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html

After quite a bit of struggling (turning off calico, changing hostPort from 5555 to 80, etc. - nothing changed) we resolved using the plain version of Kubernetes, from the main Deis tutorial here: https://deis.com/docs/workflow/quickstart/provider/vagrant/boot/

with the notable change of Vagrant version, downgrading to 1.8.3, since the 1.8.5 has this bug: hashicorp/vagrant#5186 (it's marked as closed but there's a regression in 1.8.5).

So, for us, the problem was in the CoreOs package. We haven't tried the very last commit though.

EDIT: we also tried the last commit from the CoreOs repository (commit #bdfe006) with Deis 2.4.1, nothing changed.

@think01 So you think that kubelet-wrapper provided with CoreOS may be a cause of this problem, right ?

@DavidSie well, I cannot say the problem is in that component, but we solved by avoiding to use the coreos-kubernetes package and going plain with kubernetes on vagrant (that creates some fedora bosex).

Why you talk about kubelet-wrapper?

Because I saw that CoreOS is shipped with this script /usr/lib/coreos/kubelet-wrapper but there I see it only starts hyperkube on rkt.

ping @DavidSie, were you able to identify the root cause of your issue here?

I am experiencing what I think is a similar issue. My image is 385.9M (so it's >100M as mentioned by @bacongobbler). Regarding "inspecting" the backend - I cannot figure out how to get helpful logging out of the minio pod. I've tried the --debug switch in various permutations, then found minio/minio#820 which seems to indicate that it's no longer valid because it's not needed. I've tried setting MINIO_TRACE=1 per some code fragments I found. However, the kubectl --namespace logs deis-minio-123xyz only ever shows what I assume is the minio startup stuff - there's no debug log, no trace log, nothing to indicate the behavior of minio during operation.

The first time:
deis pull

2016-09-21 08:28:43
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (400): {"detail":"dial tcp 10.11.28.91:9000: i/o timeout"}
zsh: exit 1     deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server

controller logs

INFO [aergo-server]: build aergo-server-11b3c2a created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: dial tcp 10.11.28.91:9000: i/o timeout
ERROR:root:dial tcp 10.11.28.91:9000: i/o timeout
Traceback (most recent call last):
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: dial tcp 10.11.28.91:9000: i/o timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 95, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: dial tcp 10.11.28.91:9000: i/o timeout
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 51 "Deis Client v2.5.1"

Then immediately, I try again:
deis pull

2016-09-21 08:42:27
rbellamy@eanna i ~/Development/Terradatum/aergo/aergo-server feature/docker % deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server
Creating build... Error: Unknown Error (502): <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.11.2</center>
</body>
</html>

zsh: exit 1     deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server

controller logs

INFO [aergo-server]: build aergo-server-c09bb9b created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
INFO Pushing Docker image localhost:5555/aergo-server:v4
10.10.2.8 "GET /v2/apps/aergo-server/logs HTTP/1.1" 200 1284 "Deis Client v2.5.1"
INFO Pushing Docker image localhost:5555/aergo-server:v4
[2016-09-21 16:05:50 +0000] [24] [CRITICAL] WORKER TIMEOUT (pid:37)
[2016-09-21 16:05:50 +0000] [37] [WARNING] worker aborted
  File "/usr/local/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 192, in run
    super(Application, self).run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 189, in run
    self.manage_workers()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 524, in manage_workers
    self.spawn_workers()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 590, in spawn_workers
    self.spawn_worker()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/arbiter.py", line 557, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 132, in init_process
    self.run()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 124, in run
    self.run_for_one(timeout)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 30, in accept
    self.handle(listener, client, addr)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 176, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/wsgi.py", line 170, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in get_response
    response = self._middleware_chain(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/utils/deprecation.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/app/api/middleware.py", line 22, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 39, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/viewsets.py", line 87, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 175, in log_output
    for chunk in stream:
  File "/usr/local/lib/python3.5/dist-packages/docker/client.py", line 245, in _stream_helper
    data = reader.read(1)
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/response.py", line 314, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.5/http/client.py", line 448, in read
    n = self.readinto(b)
  File "/usr/lib/python3.5/http/client.py", line 478, in readinto
    return self._readinto_chunked(b)
  File "/usr/lib/python3.5/http/client.py", line 573, in _readinto_chunked
    chunk_left = self._get_chunk_left()
  File "/usr/lib/python3.5/http/client.py", line 541, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/usr/lib/python3.5/http/client.py", line 501, in _read_next_chunk_size
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/base.py", line 191, in handle_abort
    self.cfg.worker_abort(self)
  File "/app/deis/gunicorn/config.py", line 36, in worker_abort
    traceback.print_stack()

@rbellamy can you post registry logs in a gist? That will likely give us more information why the registry is failing to communicate with minio.

@bacongobbler will do.

Also, may be related to minio/minio#2743.

Here's my setup, using Alpha channel of CoreOS and libvirt:

export KUBERNETES_PROVIDER=libvirt-coreos && export NUM_NODES=4
./cluster/kube-up.sh
# wait for etcd to settle
helmc install workflow-v2.5.0
# wait for kubernetes cluster to all be ready
deis pull 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT -a aergo-server

Worked with @harshavardhana from the minio crew to try to troubleshoot this.

For whatever reason, during our teleconsole session, I was able to successfully push the image into the deis-registry-proxy - but then saw the same dial i/o timeout but in a different context. This time, it was while pulling the image from the proxy, during the image app:deploy phase.

NOTE: you can ignore the 404 below - v4 of the aergo-server doesn't exist since I've restarted the minio pod several times during troubleshooting. The v5 release is definitely stored in minio, as can be seen in the mc ls command at the bottom of this post.

INFO [aergo-server]: build aergo-server-49c7405 created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v5
INFO Pushing Docker image localhost:5555/aergo-server:v5
INFO Pulling Docker image localhost:5555/aergo-server:v5
INFO [aergo-server]: adding 5s on to the original 120s timeout to account for the initial delay specified in the liveness / readiness probe
INFO [aergo-server]: This deployments overall timeout is 125s - batch timout is 125s and there are 1 batches to deploy with a total of 1 pods
INFO [aergo-server]: waited 10s and 1 pods are in service
INFO [aergo-server]: waited 20s and 1 pods are in service
INFO [aergo-server]: waited 30s and 1 pods are in service
INFO [aergo-server]: waited 40s and 1 pods are in service
ERROR [aergo-server]: There was a problem deploying v5. Rolling back process types to release v4.
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
INFO Pulling Docker image localhost:5555/aergo-server:v4
ERROR [aergo-server]: (app::deploy): image aergo-server:v4 not found
ERROR:root:(app::deploy): image aergo-server:v4 not found
Traceback (most recent call last):
  File "/app/scheduler/__init__.py", line 168, in deploy
    deployment = self.deployment.get(namespace, name).json()
  File "/app/scheduler/resources/deployment.py", line 29, in get
    raise KubeHTTPException(response, message, *args)
scheduler.exceptions.KubeHTTPException: ('failed to get Deployment "aergo-server-cmd" in Namespace "aergo-server": 404 Not Found', 'aergo-server-cmd', 'aergo-server')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/models/app.py", line 578, in deploy
    async_run(tasks)
  File "/app/api/utils.py", line 169, in async_run
    raise error
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/app/api/utils.py", line 182, in async_task
    yield from loop.run_in_executor(None, params)
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/scheduler/__init__.py", line 175, in deploy
    namespace, name, image, entrypoint, command, **kwargs
  File "/app/scheduler/resources/deployment.py", line 123, in create
    self.wait_until_ready(namespace, name, **kwargs)
  File "/app/scheduler/resources/deployment.py", line 338, in wait_until_ready
    additional_timeout = self.pod._handle_pending_pods(namespace, labels)
  File "/app/scheduler/resources/pod.py", line 552, in _handle_pending_pods
    self._handle_pod_errors(pod, reason, message)
  File "/app/scheduler/resources/pod.py", line 491, in _handle_pod_errors
    raise KubeException(message)
scheduler.exceptions.KubeException: error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/models/release.py", line 168, in get_port
    port = docker_get_port(self.image, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 203, in get_port
    return DockerClient().get_port(target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 79, in get_port
    info = self.inspect_image(target)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 156, in inspect_image
    self.pull(repo, tag=tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 128, in pull
    log_output(stream, 'pull', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/app.py", line 585, in deploy
    self.deploy(release.previous(), force_deploy=True, rollback_on_failure=False)
  File "/app/api/models/app.py", line 526, in deploy
    port = release.get_port()
  File "/app/api/models/release.py", line 176, in get_port
    raise DeisException(str(e)) from e
api.exceptions.DeisException: image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 64, in create
    self.app.deploy(new_release)
  File "/app/api/models/app.py", line 595, in deploy
    raise ServiceUnavailable(err) from e
api.exceptions.ServiceUnavailable: (app::deploy): image aergo-server:v4 not found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: (app::deploy): image aergo-server:v4 not found
10.10.2.8 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 59 "Deis Client v2.5.1"

And as you can see, the minio store definitely contains the image, and the proxy can communicate with the minio backend:

root@deis-registry-proxy-ccf4u:~# mc ls myminio/registry -r
[2016-09-21 19:47:36 UTC] 1.5KiB docker/registry/v2/blobs/sha256/2f/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/data
[2016-09-21 19:46:45 UTC]   112B docker/registry/v2/blobs/sha256/53/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/data
[2016-09-21 19:47:36 UTC] 5.0KiB docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data
[2016-09-21 19:46:45 UTC]   232B docker/registry/v2/blobs/sha256/a6/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/data
[2016-09-21 19:47:14 UTC]  94MiB docker/registry/v2/blobs/sha256/b4/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/data
[2016-09-21 19:47:19 UTC]  94MiB docker/registry/v2/blobs/sha256/c0/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/data
[2016-09-21 19:47:14 UTC]  47MiB docker/registry/v2/blobs/sha256/d0/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/data
[2016-09-21 19:46:59 UTC] 2.2MiB docker/registry/v2/blobs/sha256/e1/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/data
[2016-09-21 19:46:45 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5345ff73e9fcf7b6c7d2d7eca2b0338ab274560ff988b8f63e60f73dfe0297ec/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/link
[2016-09-21 19:46:45 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/a696cba1f6e865421664a7bf9bf585bcfaa924d56b7d2a112a799e00a7433791/link
[2016-09-21 19:47:18 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/b419440b08d223eabe64f26d5f8556ee8d3f4c0bcafb8dd64ec525cc4eea7f6e/link
[2016-09-21 19:47:19 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/c0963e676944ab20c36e857c33d76a6ba2166aaa6a0d3961d6cf20fae965efd0/link
[2016-09-21 19:47:18 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/d0f0d61cd0d229546b1e33b0c92036ad3f35b42dd2c9a945aeaf67f84684ce26/link
[2016-09-21 19:46:59 UTC]    71B docker/registry/v2/repositories/aergo-server/_layers/sha256/e110a4a1794126ef308a49f2d65785af2f25538f06700721aad8283b81fdfa58/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/revisions/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/current/link
[2016-09-21 19:47:36 UTC]    71B docker/registry/v2/repositories/aergo-server/_manifests/tags/v5/index/sha256/2fc6d0a3ec447743456f6fe782622ede8095b662bb39cb10c50b2a795829e51f/link

@bacongobbler - if you have a setup locally we can work on this and see what is causing the problem. Do not have kubernetes setup locally. i/o timeout seems to be related to network problem between registry and minio server. Need to see if the server itself is not responding properly. Couldn't see it with mc though.

@harshavardhana unfortunately we do not have any clusters reproducing this issue locally nor can we reproduce it ourselves, other than for the calico networking issue.

@rbellamy if you can supply information about how you set up your cluster including your KUBERNETES _PROVIDER envvar when using kube-up.sh and what version of workflow we can try to reproduce there. As far as e2e is concerned we aren't seeing this issue in master or in recent releases. http://ci.deis.io

@bacongobbler I included that information in a comment in this issue: #64 (comment)

Thank you! From what others have voiced earlier it sounds like this sounds related to a CoreOS issue as seen earlier in #64 (comment). I'd recommend trying a different provider first and see if that resolves your issue.

I'm not sure how diagnostic this is, given I'm testing within a single libvirt host - however it should be noted that the host is running 2 x 12 AMD Opteron CPUs on a Supermicro MB with 128G RAM and all SSDs, and each VM is provisioned with 4G and 2CPUs, so I find it hard to believe that the issue at hand is related to overloaded VM host or guest.

From what @bacongobbler has said, deis hasn't seen this in their e2e test runner on k8s. I'd be interested to know what the test matrix looks like WRT other providers/hosts.

Maybe this is a CoreOS-related problem? Given coreos/bugs#1554 it doesn't seem outside the realm of possibility.

Kubernetes on CoreOS (using libvirt-coreos provider and ./kube-up.sh script)

  • master with 4 nodes (5 total VMs) does not work (see errors above)
  • master with 3 nodes (4 total VMs) does not work (see errors in this comment)
  • master with 2 nodes (3 total VMs) works

master with 3 nodes

INFO [aergo-server]: build aergo-server-6972f5f created
INFO [aergo-server]: rbellamy deployed 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Pulling Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT
INFO Tagging Docker image 192.168.57.10:5000/aergo-server:1.0.0-SNAPSHOT as localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO Pushing Docker image localhost:5555/aergo-server:v2
INFO [aergo-server]: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
ERROR:root:Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
Traceback (most recent call last):
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 199, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 135, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 178, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 195, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 95, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/aergo-server/: read tcp 127.0.0.1:49384->127.0.0.1:5555: read: connection reset by peer
10.10.1.5 "POST /v2/apps/aergo-server/builds/ HTTP/1.1" 400 142 "Deis Client v2.5.1"

Maybe this is a CoreOS-related problem? Given coreos/bugs#1554 it doesn't seem outside the realm of possibility.

Yes, I do believe this is a CoreOS related problem as I mentioned in my previous comment. If you can try provisioning a cluster with a different provider that can help narrow down the issue.

@bacongobbler I've used corectl and Kube-Solo with success.

@DavidSie after reading the logs just a little more closely, I realized that this seems to be that it looks like your docker daemon is trying to push to a v1 registry endpoint.

Put http://localhost:5555/v1/repositories/spree/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"

Notice the v1 in there. Since this is directly related to dockerbuilder because buildpack deploys work fine for you, I wonder if it's due to the docker python library auto-detecting the client version: https://github.com/deis/dockerbuilder/blob/28c31d45a17a97473e83c451b0d2e743678620c0/rootfs/deploy.py#L106

@rbellamy can you please re-open a separate issue? Your issue doesn't look to be the same as it looks like the original error from your report is about minio:

error pulling image configuration: Get http://10.11.28.91:9000/registry/docker/registry/v2/blobs/sha256/59/5905a7c362fbff9626d517a6ba0d8930fba34a321ba4c7bb718144d80cfaf29b/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=8TZRY2JRWMPT6UMXR6I5%2F20160921%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20160921T194800Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=314c92bb84dbd4dd41f9bc572e625201a32ce300394d34e8516a57382fd2ec52: dial tcp 10.11.28.91:9000: i/o timeout

Yes. @rbellamy believe he has nailed it down as a symptom of coreos/bugs#1554. Thank you for the offer, though!

@bacongobbler
Do you know how can I fix this issue ? Simply update deis (now I use 2.3.0)

I'm not sure how this could be fixed, however using 2.5.0 would never hurt.

I ran into this exact problem when setting up using the CoreOS tool as well. It's too bad that CoreOS aws-cli has this problem b/c the CoreOS tool works really well with cloudformation, which makes teardown a snap after trying out deis. kube-up does not use cloudformation and leaves crap all over your AWS account after you're done with it.

@dblackdblack even after using ./cluster/kube-down.sh? I've always found that script tears down all the AWS resources it created.

So after debugging with both @jdumars and @felixbuenemann, both clusters seem to be showing the same symptom. The problem? Requesting a hostPort on some providers - like Rancher and CoreOS - does not work. @kmala pointed me towards kubernetes/kubernetes#23920 so it looks like we found our smoking gun.

And for anyone who wants to take a crack at trying a patch, they can run through the following instructions to patch workflow-v2.7.0, removing registry-proxy and making the controller and builder connect directly with the registry. This will require the old --insecure-registry flag to be enabled so the docker daemon can talk to the registry, but here's the commands and the patch on a fresh cluster that shows this symptom:

git clone https://github.com/deis/charts
cd charts
curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -
kubectl delete namespace deis
# also delete any app namespaces so you have a fresh cluster
rm -rf ~/.helmc/workspace/charts/workflow-v2.7.0
cp -R workflow-v2.7.0 ~/.helmc/workspace/charts/
helmc generate workflow-v2.7.0
helmc install workflow-v2.7.0

Note that this will purge your cluster entirely of Workflow.

There is currently no workaround for this as far as I'm aware, but if users want to bring this issue to light they can try to contribute patches upstream to kubernetes! :)

@zinuzoid the instructions above use that exact patch :)

EDIT: I missed the one line change you made in your patch and the fact it's for workflow-dev. Nice catch!

@bacongobbler plus one line in workflow-dev/tpl/storage.sh for me to make it work :)

I'm going to close this issue as there is nothing we can do here to work around this issue in Workflow other than with the patch I provided. This is an upstream issue and patches should be applied upstream. Until then please feel free to run with the patch provided here for production deployments that rely on CNI networking. Thanks!

When applying the patch got this corrupt patch at line 6 message:
mbr-31107:charts jwalters$ curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/32a86cc4ddfa0a7cb173b1184ac3e288dedb5a84/remove-registry-proxy.patch | git apply -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3080 100 3080 0 0 3557 0 --:--:-- --:--:-- --:--:-- 3556
fatal: corrupt patch at line 6

@jwalters-gpsw try again. I just fixed the patch.

curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/992a95edb8430ebcddba526fb1c48d9d0fcc1166/remove-registry-proxy.patch | git apply -

v2.8.0 patch:

curl https://gist.githubusercontent.com/bacongobbler/0b5f2c4fe6f067ddb775d53d635cc74d/raw/248a052dd0575419d5890abaedec3a7940f3ada6/remove-registry-proxy-v2.8.0.patch | git apply -

Thanks for the updated patch. I'm running coreos on AWS. Is there a way for me to restart the docker daemons with the insecure registry option? Or would I need to redeploy the cluster?

It's easier to re-deploy the cluster if you're just getting set up. Otherwise you'll have to manually SSH into each node, modify the daemon startup flags and reboot docker on every node.

Thanks. I will give that a try. Also thinking about doing a Deis upgrade to the same version per the upgrade instructions but setting the registry to an off-cluster registry.

Manually updated the worker nodes docker config and applied your changes and its working fine now.

ineu commented

Sorry for raising this old thread, but could you please explain how to apply this patch to the 2.9 which is deployed via helm and not helm classic?

You can fetch the chart locally via helm fetch deis/workflow --version=v2.9.1 --untar, modify the chart with the patch (which you'll have to manually apply since it's not in git), then install it :)

ineu commented

Thank you

Patched with @bacongobbler's suggested fixes #64 (comment) latest helm workflow charts: https://github.com/anubhavmishra/workflow.
Also make sure you are using insecure registry option for Docker suggested here: https://deis.com/docs/workflow/en/v2.2.0/installing-workflow/system-requirements/#docker-insecure-registry

For v2.15.0, the recipe will be:

helm fetch deis/workflow --version=v2.15.0 --untar
cd workflow
curl https://gist.githubusercontent.com/IlyaSemenov/a8f467934cb5f1f0963469cd3eb32ace/raw/b3e8fcb5dd9094b50014177f5db72210b2949883/0001-Remove-proxy.patch|patch -p1
helm upgrade deis .

Don't forget to enable insecure registry /lib/systemd/system/docker.service at your Docker host(s):

ExecStart=/usr/bin/dockerd -H fd:// --insecure-registry=10.43.0.0/16

Removing the registry proxy should no longer be needed with current versions of the deis helm charts, you can set the following in your deis-workflow values.yml if you are using CNI:

global:
  host_port: 5555
  use_cni: true
  registry_proxy_bind_addr: "127.0.0.1:5555"

It's not working on Kubernetes 1.5.4 provisioned with Rancher 1.6.2 (latest).

I think this is the related issue rancher/rancher#5857.

I'm new to Deis and I'm encountering all kind of problems in my journey to deploy Deis in AWS.
The last one is when I tried to deploy a Doker image to Deis. For example for a pgadmin4 Docker image when running deis pull ephillipe/pgadmin4, I'm getting this error:
Creating build... Error: Unknown Error (400): {"detail":"Put http://127.0.0.1:5555/v1/repositories/pgadmin4/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"}
I checked the running daemonsets: kubectl --namespace=deis get daemonsets and I'm getting:

deis-logger-fluentd     2         2         2         2            2           <none>          6d
deis-monitor-telegraf   2         2         2         2            2           <none>          6d
deis-registry-proxy     0         0         0         0            0           <none>          6d

So clearly the problem is because deis-registry-proxy is not running.

Can anyone help me with this issue?
How can I start deis-registry-proxy or if that's not the solution how can I deploy a docker image then?

@IulianParaian I would try the deis slack for troubleshooting. Might be your registry proxies are crashing because the internal registry is unreachable.

@felixbuenemann I did tried the said slack first but didn't get any responses. And I also couldn't find a good documentation or a simple example on how to deploy an app from Docker image/ Docker file. I'm not referring to the official Deis documentation because there are just 3 lines of text with one command line that should work, but obviously it is not.
So maybe some more detailed tutorials with some possible troubleshooting would help.

PS: I raised another issue on Workflow repo regarding an installation using off cluster storage, but no response there either. And for that I also followed the official steps.

And I also couldn't find a good documentation or a simple example on how to deploy an app from Docker image/ Docker file.

I understand your frustration, though if the documentation is lacking, there are example applications provided for nearly any configuration you're looking for in the github org, and we do link to those example applications in the documentation. For example: https://github.com/deis/example-dockerfile-http

Have you taken a look at the troubleshooting documentation? That should help give you a general guideline on how you can self-troubleshoot why your cluster is not working the way it should. If all else fails you can troubleshoot directly using kubectl following kubernetes' documentation.

Hi @bacongobbler, thank you for the answer.
I did troubleshooting my kubernetes and noticed that deis-registry-proxy component was not running.
This example https://github.com/deis/example-dockerfile-http is one that I tried.

As I am writing this I went to check the deis pods again and surprisingly, I have 2 deis-registry-proxy instances running. That is strange, I didn't change anything since I posted the issue.
Thanks again.