Assigning output UIDs
amakelov opened this issue · 1 comments
amakelov commented
There are two choices we have for this:
- content hashing: once the function computes its outputs, they are content
hashed and this is their UID.- good: calls that end up computing the same thing (a rare but not
vanishingly so event) do not duplicate storage; accidentally unwrapping and
then re-wrapping an output won't mess up the UID. - bad: takes time for large objects; objects no longer have a unique
history (so, for example, you can't generate a unique piece of code that
lead to this output)
- good: calls that end up computing the same thing (a rare but not
- causal hashing: the call UID is combined with e.g. the index of the output
of the function to obtain a new UID.- good: each value has a unique history; fast to compute
- bad: you could end up in a situation where you break the chain of
relations between things if you unwrap an output (which will lose the causal
UID), and then wrap it again (which will assign it a content-based UID).
This means that the system treats the two values as different, so you could
end up computing the same things twice, and your relational queries will be
broken.
It's very easy to add a config option and switch between the two, but I think
it'd be good to figure out what we want to go for here in terms of clarity.
Seems like content hashing is a safer bet in terms of transparency and avoiding
"broken" state?
amakelov commented
Closing this as we decided on content hashing for now!