mirage/digestif

Constant-time `eq` / `neq`

cfcs opened this issue · 20 comments

cfcs commented

First brought up here: #30 (comment)

see also my comment below: #30 (comment)

I'd be happy to adapt the code example I posted into a PR if you would like?
ping @dinosaure

cfcs commented

Duplicate of: #33

cfcs commented

Re: compare:
The traditional compare that returns 0 on equality and a negative/positive int depending on the difference is dangerous since sorting by these hashes can allow attackers to infer information about them.

If you just want a normal hashmap, you should probably use a lighter hash function, so I'm not sure if there's a good argument for keeping this functionality.

I tried to come up with a solution:

  1. One potential strategy would be to remove compare from the interface
  2. Another potential strategy is to cripple it so it can only return 0 and -1 (or 0 and 1), reflecting equality and inequality respectively, but without revealing the relative differences. This probably violates some invariants mandated by various sorting algorithms and lookup methods, so that is not so elegant.
  3. A third potential strategy would be to still allow sorting, but to compare something that does not reveal information about the contents.
  • This could for instance be a pointer value, but that doesn't work for bytes that moves around in the OCaml heap.
  • Digestif could also maintain an internal monotonically increasing counter and assign a unique id to each created t. This is not so pretty, and it could have implications such as allowing an attacker to learn the order in which hashes was created (but still an improvement over the current compare, and allows sorting).
  1. In the style of 3) Digestif could assign an internal unique random id. This has the downside of requiring a CSPRNG, and would either require expensive/complicated tests to check uniqueness, or a longer value (128 bit or whatever) to probabilistically avoid collisions.
    This approach - like 3) - has the downside of comparisons not being deterministic across execution of the application using digestif, so that could also create problems. A CSPRNG from nocrypto or whatever is also a bit of an annoying dependency to add.

Unfortunately these all have downsides, so I'm leaning towards 1) (removing compare entirely), or maybe renaming to unsafe_compare and attaching a docstring in the .mli that explains the problem.
What do you think?

Ok, on my side, I believe the best is to rename compare to unsafe_compare and put an appropriate documentation about that. Because an hash in digestif is a string, people can make a better compare function with it (like the third solution) but it's so weight to integrate it in digestif.

cfcs commented

@dinosaure unsafe_compare: sounds good to me. I still think it's important to provide the non-data-dependent eq/neq.

Ok, I will fix that.

Currently, digestif uses this function from the standard library about eq (then, neq is a wrapper with not operator).

Another potential strategy is to cripple it so it can only return 0 and -1 (or 0 and 1), reflecting equality and inequality respectively, but without revealing the relative differences. This probably violates some invariants mandated by various sorting algorithms and lookup methods, so that is not so elegant.

There is no cripple here, this is the expected behaviour. I don't think any sorting algorithm is relying on the absolute value of that integer. And if that's the case I will be very happy to break them without any trace of regret :p

From what I understand, because caml_string_equal short-circuits after finding the first differing byte, this function is not safe. A solution should be close to what OpenBSD do about that.

Hum wait, I think I haven't understood what you meant. If you meant having compare x y = compare y x = 1 we will indeed have a problem. Please forget the noise :p

I am not sure which kind of issue your are trying to solve and which property you want compare to have, but having a way to compare hashes is useful for Set.Make, which is, in general, very useful to have.

Would the following compare function be better under your constraints?

let compare x y = Int.compare (Hastbl.hash x) (Hashtbl.hash y)

I did this code for eq function: https://paste.isomorphis.me/D8h.

And @cfcs, good luck for the FIFA World Cup 👍 !

cfcs commented

@samoht Exactly, the suggestion was to prepare a broken compare, that would indeed not be good for ordered things like Set/Map. I'm not familiar with the internals of Hashtbl.hash works, and didn't have the implementation for caml_hash handy. My concern with using raw pointers directly was that they can change when things are moved from minor to major heap. Does caml_hash return a stable value even when the "real pointer" changes? If so, then that would indeed be an acceptable solution imho!

@dinosaure Cool, that's equivalent to what I proposed (using a ref instead of my TCO recursive function).
And thanks! :D

caml_hash produces a stable value and is based on murmur3 so I think it will meet your requirements.

So if nobody disagree to use @samoht 's solution, I will provide unsafe_compare = String.compare and compare a b = (Hashtbl.hash a) - (Hashtbl.hash b).

Preferably use int_compare than a subtraction to avoid overflow.

cfcs commented

@samoht Hmm, to me it seems like it hashes the string contents, so I don't think this fulfills our requirements for a safe interface.

@dinosaure I think the Hashtbl.hash approach is better than plain String comparison though, since at least it's seeded, so it makes it harder to accidentally screw up really bad.
What do you think about just exposing let unsafe_compare a b = Int.compare (Hashtbl.hash a) (Hashtbl.hash b) - I mean if people want lexographical sorting they can just extract the hashes to strings?

Currently, for some projects (ocaml-git), I really need a lexicographical order. So my idea is to provide unsafe_compare (String.compare) and compare (@samoht 's idea) like what I did in this PR: https://github.com/mirage/digestif/pull/37/files#diff-423b10df875b85675660fa99a16d0219

cfcs commented

@dinosaure ok, that's fair. The compare using Hashtbl.hash is not safe since it essentially hashes the hash and compares the two hashes, so the safety of that relies on the assumption that the murmurhash seed cannot be reconstructed by an attacker, which is a bit muddy IMHO. But if we document that in the mli I think that's a reasonable compromise.

Close by #37