distribution/distribution

Garbage collection does not delete data from cache, preventing reupload of deleted image

roblabla opened this issue · 2 comments

Description

Currently, GC does not delete data from the redis cache. This can result in some very weird scenario where trying to reupload an image that was deleted will appear to succeed (the registry thinks it already has the files we're uploading, and so returns 201), but then trying to use it fails with a MANIFEST_UNKNOWN error as the file is not actually present.

Reproduce

We'll need a registry configured to use a redis cache. I used a filesystem storage with redis configuration for simplicity, but any backend storage will do, the problem stems from the cache.

Here's my configuration

version: 0.1
log:
  level: debug
  fields:
    service: registry
storage:
  filesystem:
    rootdirectory: data
    maxthreads: 100
  cache:
    layerinfo: redis
  maintenance:
    uploadpurging:
      enabled: true
      age: 168h
      interval: 24h
      dryrun: false
  delete:
    enabled: true
  redirect:
    disable: true
redis:
  addr: localhost:6379
  db: 2
  readtimeout: 10s
  writetimeout: 10s
  dialtimeout: 10s
  pool:
    maxidle: 100
    maxactive: 500
    idletimeout: 60s
http:
  secret: test123
  addr: :5100
  relativeurls: false
  debug:
    addr: localhost:5101
validation:
  disabled: true
compatibility:
  schema1:
    enabled: true

here's the reproducer:

# First, start the registry and put it in the background.
./bin/registry serve config.yaml &

# Next, upload an image. I used skopeo for this
skopeo --insecure-policy copy --dest-tls-verify=false "docker://alpine:3.19.1" "docker://localhost:5100/alpine:3.19.1" --multi-arch all

# We can verify that the push worked by downloading its manifest. So far so good.
curl -H 'Accept:application/vnd.docker.distribution.manifest.v2+json' http://localhost:5100/v2/alpine/manifests/3.19.1

# Next up, let's delete the manifest.
curl -XDELETE http://localhost:5100/v2/alpine/manifests/3.19.1

# And run garbage collection. This will actually delete the data from the blobstore. However, the cache won't be cleaned!
./bin/registry garbage-collect config.yaml -m

# And finally, let's try pushing the same image again
skopeo --insecure-policy copy --dest-tls-verify=false "docker://alpine:3.19.1" "docker://localhost:5100/alpine:3.19.1" --multi-arch all

# And download the manifest again. This will fail with the MANIFEST_UNKNOWN error.
curl -H 'Accept:application/vnd.docker.distribution.manifest.v2+json' http://localhost:5100/v2/alpine/manifests/3.19.1

Expected behavior

I expect pushing the image to actually work. Either pushing should peer through the cache and check whether it is up-to-date, or garbage-collection should remove the keys from the cache.

registry version

Tested both on 2.8.3 and main branch (9b3eac8)

./bin/registry github.com/distribution/distribution/v3 v3.0.0-alpha.1.m+unknown

Additional Info

This was discovered via harbor's retention policy. Harbor uses distribution as the underlying backend for its storage. When it applies its retention policy, it deletes the manifests, and automatically runs garbage-collection to free up space, triggering this bug.

I also encountered this problem, how to solve it?

I think this might be related to PR #3323