Assigning output UIDs

Question

Assigning output UIDs

amakelov opened this issue 3 years ago · 1 comments

There are two choices we have for this:

content hashing: once the function computes its outputs, they are content
hashed and this is their UID.
- good: calls that end up computing the same thing (a rare but not
  vanishingly so event) do not duplicate storage; accidentally unwrapping and
  then re-wrapping an output won't mess up the UID.
- bad: takes time for large objects; objects no longer have a unique
  history (so, for example, you can't generate a unique piece of code that
  lead to this output)
causal hashing: the call UID is combined with e.g. the index of the output
of the function to obtain a new UID.
- good: each value has a unique history; fast to compute
- bad: you could end up in a situation where you break the chain of
  relations between things if you unwrap an output (which will lose the causal
  UID), and then wrap it again (which will assign it a content-based UID).
  This means that the system treats the two values as different, so you could
  end up computing the same things twice, and your relational queries will be
  broken.

It's very easy to add a config option and switch between the two, but I think
it'd be good to figure out what we want to go for here in terms of clarity.
Seems like content hashing is a safer bet in terms of transparency and avoiding
"broken" state?

Answer 1 · 2022-09-27T20:04:47.000Z

Closing this as we decided on content hashing for now!