ngclient: support `StorageBackendInterface`?
Opened this issue · 2 comments
Description of issue or feature request:
Right now, tuf.ngclient
is heavily tied to local system I/O: it assumes a metadata directory on disk that can be read/written. For example:
python-tuf/tuf/ngclient/updater.py
Lines 293 to 312 in 4d2ff8d
This is problematic in distributed worker setups like Warehouse (PyPI), where each worker has its own container/entire VM and thus can't easily share on-disk TUF repos. In particular, this causes both reliability and security concerns:
- Reliability: an unfortunate corruption in a single worker's TUF repo results in a hard-to-diagnose flaky worker, since each worker has its own copy of the repo.
- Security: each worker's TUF repo is independently stored on a (machine-local) disk, making them harder to audit.
This problem was noted a few years back, before tuf.ngclient
was created: #1009. The solution then was to add a filesystem abstraction to the tuf.metadata
APIs, which was done via secure-systems-lab/securesystemslib#232 and #1009. However, this abstraction wasn't added to the ngclient
APIs, only to the low-level metadata
ones.
Current behavior:
tuf.ngclient
currently assumes that it can perform persistent local I/O for its repository.
Expected behavior:
tuf.ngclient
should support an I/O abstraction (such as the pre-existing StorageBackendInterface
, if suitable) for persistent repo operations, enabling use in distributed deployments.
I think the expected behaviour sounds reasonable.
There is a related question to consider -- in a scenario where you have "distributed workers", maybe what you really want is a bunch of "read-only" workers that operate without ever connecting to the repository (at least for metadata), and one writing tuf client that actually does the updates at regular intervals.
Previously we tried to make an offline mode that would be use friendly -- usable by CLI apps -- and that turned out complicated (compared to the potential advantages). The "offline mode" described above (where it's ok to just immediately fail if the local metadata is not up-to-date and someone promises to keep it updated) would be simple to add.
"dumb read-only mode" or IO abstraction (or both) sound like things that could be added as optional features to ngclient.
- Abstracting the metadata IO should be straightforward: something still needs to take care of filename encoding but nothing should not be visible to API user (apart from the added optional argument for
StorageBackendInterface
or something) - Abstracting IO in
find_cached_target
anddownload_target
should work as well, we'll just need to make sure the optional filepath argument still makes sense -- likely that only makes sense with the default filesystem implementation
There is a related question to consider -- in a scenario where you have "distributed workers", maybe what you really want is a bunch of "read-only" workers that operate without ever connecting to the repository (at least for metadata), and one writing tuf client that actually does the updates at regular intervals
Thanks for extrapolating this! This is indeed the underlying scenario, and probably is a more accurate encapsulation of what I actually need 🙂