distribution/distribution

Garbage collect untagged manifests

Vanuan opened this issue ยท 18 comments

As suggested in #1600, there should be a garbage collect option to delete all manifests that are not references by any tags (dangling).

Use case 1:

  • push a new tag
  • delete an old tag (by deleting a manifest)
  • reclaim disk space by running garbage collection

Use case 2:

  • push a latest tag (creating "manifest 1")
  • push a latest tag (creating "manifest 2")
  • reclaim disk space by running garbage collection (delete "manifest 1" automatically)

#1813 is a pre-requisite to this feature

I assume that the first use case should already be working:

  1. fetch a list of tags: GET /v2/<name>/tags/list (sorted by create/update date)
  2. for each tag, fetch a manifest digest: GET /v2/<name>/manifests/<reference> (use tag for reference)
  3. delete all tags except of the last created/updated: DELETE /v2/<name>/manifests/<reference> (use digest for reference)

But there's no way to implement the second one.

In our company we use tags that match the branch name og the code, which corresponds to your use case 2. I was actually quite surprised that this wasn't the standard behaviour, since as far as I can see there is no way to retrieve the old manifest digests from the API. Our current workaround is to delete the current manifest before pushing a new one.

bwb commented

Use case 2 is extremely common.

Blobs that would be eligible for garbage collection were it not for untagged manifests consume the majority of storage space allocated to the private registries I'm responsible for.

Is anyone working on this?

Correct me if I'm wrong but second use case scenario is:

For each repository:

  1. get list of all manifests
  2. for each tag remove tag's manifest from list of all manifests
  3. remove remaining manifests from storage

After above steps garbage collector will reclaim disk space.

Yeah that would be how to solve it. Though there are 3 more steps:

  1. Stop registry
  2. Run gc
  3. Start registry

This should solve point 1: #2199
So that we can implement 1-3 through HTTP API.
4-6 still impossible without some docker commands.

Fixed by #2302 ?

Is it released yet?

Does it require putting registry to read only mode or restarting registry?

It was merged into master ~two weeks ago. Latest release was in July 2017.
It's worth testing it.

I tried a build of the registry (through https://github.com/docker/distribution-library-image) and it seems to work (pushed several images on the same tag, all but the latest push, but I didn't try it on a large setup.

It still requires a docker command (docker exec -i -t registry /bin/registry garbage-collect /etc/docker/registry/config.yml -m will do - the culprit is the new option).

It works without changing the registry mode, but I am not experienced enough with Docker to say if that's safe - feedback welcome. I may be wrong, but I'd test to run it as a daily cron task rather than having a remote script triggering it.

I'd love to see a new release tagged so this functionality can start to permeate the ecosystem :)

Wouldnโ€™t this be problematic for people that are just pushing digests?

Maybe if they only pushed digests they should not call the garbage-collect command. However I really do not see a reason to only push digests.

@taladar Why push tags, if your entire pipeline just needs digests?

@sargun It's just a feature. You're not required to use it. If you can afford infinite disk space you would just not use the "garbage collect untagged manifests" option. But people running small projects actually run out of space sometimes :)

Yes, there's a drawback, as if you're running tag "latest" and you push another tag "latest" (thus automatically deleting a previous digest if it isn't tagged) your services will fail as digest that they're running wouldn't be found.

As a workaround you could just tag every image you push with a timestamp. This way you would essentially have human readable digests and ability to remove, say, all digests older than a week. To implement the same functionality now you have to go through unbearable pain implementing that service which would talk to registry through its API which still doesn't support proper deletion and then restart it to garbage collect.

2.7.0 has been officially released

2.7.0 has been officially released

Can this issue be closed as of ff87ad8 (#2302)? i.e. is #2301 a duplicate of this issue?