bpineau/katafygio

jobs created by cronjobs are causing too much commits

fredleger opened this issue · 2 comments

Following #81 issue i noticed that the long git init time came from the fact that there was 105 000 commits in the repo !

Digging this down i noticed that a couple of Jobs created through Cronjob are backuped every minute.

Even if we can still exclude the Job kind I do think this not a good option since we can have some valid jobs to backup here (like init ones). My proposal would be then to have some sort of filtering either based on the OwnerReference fields of a job object or based on names like mycronjob-jobname-* (but this is less elegant i guess).

What do you think ?

Good idea, both suggestions seems worth having. The first one being more precise, the second one may covers more cases - maybe we would also accept wildcards or regexps to filter namespaces (and their content) too.

I do filter out pods (as all mine are generated from higher level objects), but that's surely not a solution for everyone. I'll prototype something to see how it goes (have to figure out a cli interface to express the owner refs filters for instance). Thanks !

What about introducing opt-in feature to exclude all objects with metadata.ownerReferences present? AFAIK all object generated dynamicaly in Kubernetes should have this field present, refer to https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/

I would like to use this feature also (mainly for ReplicaSets and Pods).

Optionaly (if you have some easy capability of yaml parsing) you can consider adding more robust filtering options based on document structure and values. Eg. if field is present or matches regexp.