Security review: recover corrupt caches
natalieparellano opened this issue · 7 comments
Summary
In the security review, this is LOW-2: Denial-of-Service (DoS) provoked by removing build cache tarballs or altering the OCI image manifest
. The action plan asks us to ensure that
If a tarball is missing, a solution should be found by either rebuilding the corresponding tarball or wiping out the cache in order to continue the containerization process without errors, or, a second execution should be possible without errors
Further context from the initial report:
It appears that if a tarball referenced in the io.buildpacks.lifecycle.cache.metadata file is absent on the container filesystem (mounted host volume), the application containerization process quit without wiping out the cache content.
Proposal
- See here where the restorer will fail if the cache does not contain a layer with the expected diffID:
Line 132 in 44b7041
- See here where the exporter blows up if a cached layer has no contents:
Line 395 in 44b7041
- When the first error is encountered, we should wipe the cache.
Alternatively, we considered updating the cache metadata to exclude the layer/tarball that is missing. But, we are not sure if this scenario is common enough to warrant such a surgical approach.
Related
RFC #___
Context
I've got things working on my branch corrupted-cache
. But I still need to add at least one test for the Exporter
change.
@natalieparellano I went a different direction that we originally discussed. Instead of wiping out the cache, I am instead just ignoring the missing files. What do you think? Happy to do whatever.
cc @jabrown85
Instead of wiping out the cache, I am instead just ignoring the missing files.
Sounds good to me :) should we land this one?
I think @joeybrown-sf was hoping to fix up a test or something? What remains here @joeybrown-sf ?
Circling back, see discussion on buildpacks/lifecycle-private#16, our fix is working for volume caches, but for image caches we are NOT getting "layer not found" errors where we expect them (and hence are failing and bubbling up the error instead of skipping over the layer). This requires further investigation in imgutil. We added FIXMEs in the code so that it is apparent that the image cache flow requires further work. We could leave this issue open or create another issue that is specific to image caches.