buildpacks/lifecycle

Security review: recover corrupt caches

natalieparellano opened this issue · 7 comments

Summary

In the security review, this is LOW-2: Denial-of-Service (DoS) provoked by removing build cache tarballs or altering the OCI image manifest. The action plan asks us to ensure that

If a tarball is missing, a solution should be found by either rebuilding the corresponding tarball or wiping out the cache in order to continue the containerization process without errors, or, a second execution should be possible without errors

Further context from the initial report:

It appears that if a tarball referenced in the io.buildpacks.lifecycle.cache.metadata file is absent on the container filesystem (mounted host volume), the application containerization process quit without wiping out the cache content.


Proposal

  • See here where the restorer will fail if the cache does not contain a layer with the expected diffID:
    rc, err := cache.RetrieveLayer(sha)
  • See here where the exporter blows up if a cached layer has no contents:
    return fmt.Errorf("layer '%s' is cache=true but has no contents", fsLayer.Identifier())
    (we should keep this one?)
  • When the first error is encountered, we should wipe the cache.

Alternatively, we considered updating the cache metadata to exclude the layer/tarball that is missing. But, we are not sure if this scenario is common enough to warrant such a surgical approach.


Related

RFC #___


Context

Here's a demo of the broken behavior.

demo-failure

And here's a demo of the expected behavior.

expected-demo

I've got things working on my branch corrupted-cache. But I still need to add at least one test for the Exporter change.

@natalieparellano I went a different direction that we originally discussed. Instead of wiping out the cache, I am instead just ignoring the missing files. What do you think? Happy to do whatever.

Instead of wiping out the cache, I am instead just ignoring the missing files.

Sounds good to me :) should we land this one?

I think @joeybrown-sf was hoping to fix up a test or something? What remains here @joeybrown-sf ?

Circling back, see discussion on buildpacks/lifecycle-private#16, our fix is working for volume caches, but for image caches we are NOT getting "layer not found" errors where we expect them (and hence are failing and bubbling up the error instead of skipping over the layer). This requires further investigation in imgutil. We added FIXMEs in the code so that it is apparent that the image cache flow requires further work. We could leave this issue open or create another issue that is specific to image caches.