theupdateframework/python-tuf

Implementation of Offline mode for TUF

Opened this issue ยท 13 comments

Sigstore's python client would like to use the TUF updater in a fully offline manner to allow for opt-in offline verification. To do so would require TUF to function using only locally-stored materials.

Within TUF's current implementation, regardless of the validity of already-stored materials, online access is always required. Thus far, we've been able to run the updater on a modified branch using only local materials and seek to expand this modification to allow for opt-in usage of expired metadata in offline mode. This would be useful for clients like sigstore that would to allow for verification with metadata that was previously valid. A primary use case for this would involve verifying on machines that do not necessarily maintain a connection online.

Changes on our experimental branch have been within updater.py and have consisted of adding a new boolean to UpdaterConfig and avoiding online refreshes if said boolean has been set. Further implementation would require changes to trusted_metadata_set.py to skip checks against the expiry of the metadata if the aforementioned boolean has been set.

Ideally the flow of TUF's updater with these changes would go as follows, given the boolean is set:

  • Warn users about risks of using offline verification
  • Check to see if local metadata exists; exiting with an error message if none is found
  • Load local metadata, skipping checks against expiry within trusted_metadata_set.py

@jku @woodruffw correct me if I'm missing any details here

jku commented

Thanks for writing this down. I will have a look at the PR by tomorrow.

We had a TUF maintainer meeting in kubecon last week and discussed this area. The general consensus was:

  • this seems like a good idea, let's move forward with at least testing it out
  • There may be some unforeseen effects on the security properties of a complete system: let's move this work forward in specification side as well so that A) we have a very good definition for the whole process, B) more folks are considering the use case, C) we end up with a single standard solution in the various TUF implementations. There was also unrelated discussions on possibly changing the augmentation process but I think a new TAP in https://github.com/theupdateframework/taps/ is still a likely path for this work.
  • there was a general (mild) preference for the client configuration to be a max time period instead of a boolean. I think this is a fairly small tweak to the idea described in this issue and also something we might still want to consider a bit -- so no need to start rewriting anything right now
jku commented

I'll also copy-paste my definition of the use-case -- I believe this is inline with what you wrote in the original description:

  • I do not have network connection, but do have a TUF metadata cache and target cache
  • I would still like to use TUF to verify the cached targets that I have.
  • In contrast to a normal TUF verification I would like the verification to succeed even if a metadata has expired -- the other security measures provided by TUF are still useful
  • There is an explicit decision to use a "lower security" model here (maybe an option like tool --offline or something that implies that some features are being skipped): network connections should not happen when offline mode is set -- if the offline mode fails and the application still wants to update from remote, it is expected to re-try using an online Updater
jku commented

Note that in the above description, if the application always uses "offline mode" after downloading for the first time, metadata never gets updated after that... So application is now left to implement some sort "max offline" time

  • In contrast to a normal TUF verification I would like the verification to succeed even if a metadata has expired -- the other security measures provided by TUF are still useful

I think this is the only point I have some trepidation around ๐Ÿ™‚ -- if the metadata in question expires weekly (or less frequently), then IMO a hard error case for expired local metadata makes more sense than success (presumably with lots of warnings about expiry).

On the other hand, if these expires are <24 hours or similar, then this makes sense to me.

Otherwise, everything in #2359 (comment) sounds good, and is aligned with what we'd like on the sigstore-python client!

jku commented

Yeah I could easily be convinced to always respect expiry: it's definitely the part that makes me most vary of this...

It should be noted that the way sigstore clients are currently used in a lot of places would not get a lot of benefit from an offline implementation that does always respect expiry dates: in a lot of cases the client is run in a clean environment on CI so the metadata is always expired. The real fix for that issue might be to document caching and to make it as easy as possible -- so that TUF metadata gets cached and re-used.

in a lot of cases the client is run in a clean environment on CI so the metadata is always expired. The real fix for that issue might be to document caching and to make it as easy as possible -- so that TUF metadata gets cached and re-used.

Yeah, good point, and agreed -- that raises some interesting threat model/attack questions (what can an attacker do if they control the store that the TUF cache gets loaded from?), but IMO those questions largely boil down to "attacker controls the host" and that's already a "game over" scenario (since they could just rewrite the root of trust anyways).

(I also think, at least for CI workflows, that it makes sense to strongly encourage users to prefer online operations: they're online anyways due to the CI being online, and the privacy concerns aren't quite as salient since it isn't a personal machine. So, at least for sigstore-python, my first thought is to document offline mode as really only being useful to human signers/verifiers.)

jku commented

Ok, have spent a little more time with this. There's a thread on sigstore slack (https://sigstore.slack.com/archives/C024FPJKC6L/p1694691208834919) but documenting some technical details here:

  • I think allowing expired metadata has some tricky edge cases: this should likely be a special thing for special use cases if implemented at all (think air-gapped environments)
  • just implementing a "offline" option that raises on expiry and download attempts is very easy but also painful for clients to use
  • however, that combined with "pessimistic caching" might be reasonably simple and actually useful
    • client uses new PessimisticUpdater (same API as Updater)
      • PessimisticUpdater uses an internal Updater with offline=true
      • if metadata loading fails because expiry (or the file is not in cache), PessimisticUpdater replaces the internal Updater with one that has online = True and restarts the whole update process
      • This should mostly work really well... there is one nasty consistency failure mode where missing delegated metadata might lead to Updater replacement (and so metadata update) after the client has already downloaded an artifact using the cached metadata. This would either have to lead to an error or just be documented as being possible
jku commented

Documented my thinking here:
https://docs.google.com/document/d/1IEVxgCsmLJNiAwdTFQ4aMvmGHJikmI_iOEHGpt8fIe8/edit?usp=sharing

Main takeaway: I don't think a user option "--offline" in sigstore or other TUF using app makes sense unless we also allow expired metadata -- otherwise it will be very difficult for users to figure out when the app will work and when it will fail

Hello, I've recently taken up a new job that is absorbing most of my time; thus I don't believe I will be able to offer work on this at a meaningful pace. I would be happy to offer assistance to anyone else who wanted to work on this at another time tho

jku commented

I have a untested branch of offline (non-expiry-respecting) mode in develop...jku:python-tuf:offline-mode -- It's based on the work in the existing PR and I think it looks complete now but next step is writing a bunch of tests and seeing if that's true...

I can take a look at it tonight

This has been implemented in #2472, but not been merged for the reasons summarised in #2472 (comment).

In short: there is little advantage of using TUF in offline mode, over not using TUF at all. To leverage that advantage, other work is required, which is not a priority at the moment.