amakelov/mandala

Assigning output UIDs

amakelov opened this issue · 1 comments

There are two choices we have for this:

  • content hashing: once the function computes its outputs, they are content
    hashed and this is their UID.
    • good: calls that end up computing the same thing (a rare but not
      vanishingly so event) do not duplicate storage; accidentally unwrapping and
      then re-wrapping an output won't mess up the UID.
    • bad: takes time for large objects; objects no longer have a unique
      history (so, for example, you can't generate a unique piece of code that
      lead to this output)
  • causal hashing: the call UID is combined with e.g. the index of the output
    of the function to obtain a new UID.
    • good: each value has a unique history; fast to compute
    • bad: you could end up in a situation where you break the chain of
      relations between things if you unwrap an output (which will lose the causal
      UID), and then wrap it again (which will assign it a content-based UID).
      This means that the system treats the two values as different, so you could
      end up computing the same things twice, and your relational queries will be
      broken.

It's very easy to add a config option and switch between the two, but I think
it'd be good to figure out what we want to go for here in terms of clarity.
Seems like content hashing is a safer bet in terms of transparency and avoiding
"broken" state?

Closing this as we decided on content hashing for now!