google/zerocopy

In derives, prefix generated idents with a hash of the input token stream to prevent name collisions

Opened this issue · 1 comments

We've already encountered name collisions with our derive-generated code. We can make this less likely by using unique names, but we're still worried about making sure that our code "breaks correctly" in the presence of name collisions.

I propose a more reliable solution: Use a collision-resistant hash function to hash the entire token stream input to our derive, and use the resulting hash as a prefix for all idents that we generate. This is robust since a name collision would require finding a fixed point of a collision-resistant hash function, which is something believed to be computationally infeasible. If we took this approach, instead of worrying about testing for the behavior of our derives in the face of name collisions, we could just ignore that possibility entirely.

We probably don't need to take any external dependencies for this - I believe that the standard library's SipHasher is sufficient. It's not a full CRHF in the formal cryptographic sense, but it nonetheless has reasonable formal properties:

SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.

I'll need to think about this more to convince myself that this property is sufficient for our purposes, but intuitively I'm pretty sure it's enough (and of course, even if we're vulnerable to you shooting yourself in the foot if you try really really hard, that's fine).

I've started to draft an implementation in #1726; it's only the beginning bits so far.