deis/registry

Multiple Replicas

Closed this issue · 5 comments

I recently tried scaling the registry up to increase throughput and redundancy but after doing so pushes started to fail with blob upload unknown. It appears there is some internal state while a push is on-going. To fix the issue, I set sessionAffinity: ClientIP on the deis-registry service. Any reason this would be a bad idea? If not, I can post a pull request adding it to the service template so scaling works out of the box.

vdice commented

@croemmich I'd say if this solution is working for you, a PR would definitely be welcome for us to test out; haven't yet experimented with multiple registry replicas here.

Hey!

So just for context, this repo is just a few wrapper scripts around the upstream docker registry. I noticed there's a few issues upstream that are related to "blob upload unknown": https://github.com/docker/distribution/search?q=blob+upload+unknown&type=Issues&utf8=%E2%9C%93.

I'd suggest starting there and see if there's anything actionable upstream before we start pinning tails on the donkey.

@bacongobbler, from what I can tell looking through the issues is that there is generally an issue with distributed file stores like S3, Swift, and GCS. The layer is written to a storage node, but isn't immediately available if your next request hits a different node, which is probable. When the registry tries to read back the layer meta for a consistency check a 404 is thrown if the node does not yet have the file. The S3 and Swift drivers have a wait function that polls for the segments to show up before continuing. The GCS driver, which I'm using, does not.

It would appear there is some cache internal to the registry that knows about the previously uploaded layers and allows the next to be pushed without a check to the storage backend. My workaround solves the issue by relying on that cache. Ideally the issue would be fixed upstream, but the required polling fix is pretty dirty. I'd recommend the sessionAffinity: ClientIP fix for the time being, as it fixes the issue with GCS immediately and I suspect it will slightly increase the performance when using multiple replicas of image uploads to S3 and Swift as they won't have to poll for layers. The downside is that concurrent reads and writes from the same Kube node are not scaled, but with a single replica, that's the case anyways. It does still provide high(er) availability and scaling across the cluster which was my main concern.

If you feel like hacking up a PR, that would be wonderful.

@bacongobbler, PR created, sorry for the delay.