fpco/cache-s3

Handle problem with continuously increasing cache

Closed this issue · 5 comments

It is possible for individual files that are being cache to become irrelevant, eg. new modules added old ones deleted, would result in obsolete files that could be cleaned up in order to reduce cache size as well as the bandwidth to/from AWS.

It would be too cumbersome and complicated to keep track of individual files, but considering that we are dealing with cache, it is acceptable to loose the whole thing every so often. There are two possible solutions I have for this problem. Both are not mutually exclusive and could be implemented side by side:

  • First one is lifetime restriction. It'd be possible to set a period of time on the cache file itself (which would NOT be updated during cache replacement). During that period cache file would be considered valid, but once it reached its lifetime expectancy it would be removed instead of restored.
  • Second one is addition of --max-size flag that would prevent cache-s3 to restore or save cache that hits that limit.

Both solutions would effectively clear out cache on certain conditions, thus alleviating the problem of continuously increasing cache.

mbj commented

@lehins If you are using S3 you can set a lifetime policy to expire objects contents after N days.

This AFAIK does not factor in use. Still I'm okay with loosing my cache after in my case 14 days as usually no cached object ever lives that long. And occasionally re-building things is also not that bad to prove it can still build.

Ideally cache-s3 would be able to "touch" in-use objects to prevents them from being evicted.

@mbj, I totally agree with expiration policy, that is in fact the suggested way, as it says in the readme:

The bucket should also be configured to expire older files, this way cache stored for ephemeral branches will be discarded, hence avoiding unnecessary storage costs.

Unfortunately, that doesn't solve the problem for repositories that are being updated often.

I don't know if "touching" objects to prevent from being expired on S3 is in the scope of this tool. Imagine you don't push anything two your repo for 15 days. Where is cache-s3 running during that period so it can "touch" the files? There was no CI builds during those two weeks, so your cache on S3 with 14 day policy is doomed to expire, unless you "touch" files manually.

mbj commented

I don't know if "touching" objects to prevent from being expired on S3 is in the scope of this tool. Imagine you don't push anything two your repo for 15 days. Where is cache-s3 running during that period so it can "touch" the files? There was no CI builds during those two weeks, so your cache on S3 with 14 day policy is doomed to expire, unless you "touch" files manually.

thats totally valid. I just wanted to let you know about the option (As I did not see it in the readme originally). It resolved some of the pain for us already.

OT: I'd argue that a build that did not run due the lifetime of a chache should be from scratch anyway, but this may be different in your case.

I totally support additions to this tool. My post was only about letting you know about the expiry I sadly independently discovered.

@mbj Thank you for your suggestion. I am always open to new ideas.

Those added features also have short description and examples in README/Clearing section.
cc @domenkozar , new version is released v0.1.4