devopshq/artifactory-cleanup

_collect_docker_size queries for all items in the registry

tiagomeireles opened this issue · 4 comments

Running an aql query to get all items is very slow on large repositories. I also use object storage for the binary store which likely contributes to slower queries.

Example rule combination that I'm trying to use:

    - name: Example
      rules:
        - rule: Repo
          name: "docker"
        - rule: IncludePath
          masks: "app/*"
        - rule: DeleteDockerImagesOlderThan
          days: 14

args = ["items.find", {"$or": [{"repo": repo} for repo in docker_repos]}]

I tested replacing this line with args = ["items.find", {"$or": [{'path': {'$match': 'app/*'}}]}] and it is significantly faster while retaining the size info.

Happy to attempt to contribute a fix. I thought about two potential options; disabling getting the size or accepting a mask on DeleteDockerImagesOlderThan.

it is significantly faster while retaining the size info.

What timing are you talking about, could you give an example for your case?

Like if the some cleanup-script runs even for an hour each night - it should be fine, imo.

I think right now it's not possible to pass other rules attributes to DeleteDockerImagesOlderThan - this is the reason why we requested it this way.

I stopped it after 3 hours.

I have a large backlog of things to cleanup, repo wide searches are very slow. Right now i'm using the following patch to filter to the common path of the artifacts returned, this avoid any additional parameters.

            common_path = path.commonpath([artifact['path'] for artifact in artifacts])
            args = ["items.find", {"$and": [{"repo": {"$eq": repo for repo in docker_repos}}, {"$or": [{"path": { "$match": f"{common_path}/*", }}]}]}]

Deletes are also slow in my case, each delete takes a couple minutes. Right now its performed serially, have parallel deletes been considered?

I stopped it after 3 hours.

It sounds awful, agreed. With common path it's possible that the common path will be / - so the request will be the same...
But we can add it as a quick fix if it helps for some cases. Could you create a PR for that?

have parallel deletes been considered?

There ware no needs, but it's possible. We could use thread pool for that as an easy fix.
If you want to add it too - please create a separate PR for that, don't mix with the common path