bitwalker/libring

:erlang.phash2 does not uniformly distribute atoms across the hash range, should this be documented in README?

jlerche opened this issue · 3 comments

This can be verified pretty easily in the shell, but there's an erlang mailing list thread that gives examples but here's salient bit

2> erlang:phash2(aaa).
26481
3> erlang:phash2(bbb).
26754
4> erlang:phash2(ccc).
27027

The takeaway seems to be it works great for any term except atoms. One possibility is to call :erlang.term_to_binary on the atom, or alternatively wrap it in a tuple or list.

I'm fine with doing a pre-process step via :erlang.term_to_binary/1 if that actually solves the problem, otherwise I suppose we will need a new hash function

That should be ok as :erlang.term_to_binary is pretty fast right? Here's some testing I did

iex(1)> :erlang.phash2(:foo)
27999
iex(2)> :erlang.phash2(:erlang.term_to_binary(:foo))
86811783
iex(3)> :erlang.phash2(:erlang.term_to_binary(:fooaaaa))
43291300
iex(4)> :erlang.phash2(:erlang.term_to_binary(:ffgfgff))
30215940
iex(5)> :erlang.phash2(:erlang.term_to_binary(:a))      
64216263
iex(6)> :erlang.phash2(:erlang.term_to_binary(:aaaaaa))
86861957
iex(7)> :erlang.phash2(:erlang.term_to_binary(:b))     
14428281
iex(8)> :erlang.phash2(:erlang.term_to_binary(:b), trunc(:math.pow(2,32)-1))
3235653753
iex(9)> :erlang.phash2(:foo, trunc(:math.pow(2,32)-1))                      
27999
iex(10)> :erlang.phash2(:erlang.term_to_binary(:foo), trunc(:math.pow(2,32)-1))
355247239

which is a lot closer to expected behavior.

Is this why elixir HEAD is currently generating one test failure?

  1. test ensure function clauses are ordered (ModuleTest)
    test/elixir/module_test.exs:289
    Assertion with == failed
    code: assert :erlang.phash2(atoms) == 61635213
    left: 60755908
    right: 61635213
    stacktrace:
    test/elixir/module_test.exs:297: (test)