theupdateframework/python-tuf

Investigate metadata scalability

trishankkarthik opened this issue · 5 comments

How does the implementation plan to handle metadata for a software update repository with a large number of targets and target delegations? Presently, it looks like the metadata will be quite large if left uncompressed for a sufficiently large number of targets and target delegations.

A few solutions:

  1. Compress metadata with standard (e.g. GZIP) techniques.
  2. Investigate metadata difference schemes.

#44 will give us some data about this issue.

Things to do efficiently: downloading only the subset of target metadata relevant to the target file in question; downloading as much as possible in as few HTTP requests as possible.

See #57 for a method to reduce metadata size in the common case where a delegated role is trusted with wildcard target paths.

Maybe consider binary data exchange formats, such as Protocol Buffers or Cap'n Proto.

The tentatively-named "lazy bin walk" scheme to address metadata scalability is discussed in our design document for PyPI+TUF.