:erlang.phash2 does not uniformly distribute atoms across the hash range, should this be documented in README?
jlerche opened this issue · 3 comments
jlerche commented
This can be verified pretty easily in the shell, but there's an erlang mailing list thread that gives examples but here's salient bit
2> erlang:phash2(aaa).
26481
3> erlang:phash2(bbb).
26754
4> erlang:phash2(ccc).
27027
The takeaway seems to be it works great for any term except atoms. One possibility is to call :erlang.term_to_binary
on the atom, or alternatively wrap it in a tuple or list.
bitwalker commented
I'm fine with doing a pre-process step via :erlang.term_to_binary/1
if that actually solves the problem, otherwise I suppose we will need a new hash function
jlerche commented
That should be ok as :erlang.term_to_binary
is pretty fast right? Here's some testing I did
iex(1)> :erlang.phash2(:foo)
27999
iex(2)> :erlang.phash2(:erlang.term_to_binary(:foo))
86811783
iex(3)> :erlang.phash2(:erlang.term_to_binary(:fooaaaa))
43291300
iex(4)> :erlang.phash2(:erlang.term_to_binary(:ffgfgff))
30215940
iex(5)> :erlang.phash2(:erlang.term_to_binary(:a))
64216263
iex(6)> :erlang.phash2(:erlang.term_to_binary(:aaaaaa))
86861957
iex(7)> :erlang.phash2(:erlang.term_to_binary(:b))
14428281
iex(8)> :erlang.phash2(:erlang.term_to_binary(:b), trunc(:math.pow(2,32)-1))
3235653753
iex(9)> :erlang.phash2(:foo, trunc(:math.pow(2,32)-1))
27999
iex(10)> :erlang.phash2(:erlang.term_to_binary(:foo), trunc(:math.pow(2,32)-1))
355247239
which is a lot closer to expected behavior.
jordanhubbard commented
Is this why elixir HEAD is currently generating one test failure?
- test ensure function clauses are ordered (ModuleTest)
test/elixir/module_test.exs:289
Assertion with == failed
code: assert :erlang.phash2(atoms) == 61635213
left: 60755908
right: 61635213
stacktrace:
test/elixir/module_test.exs:297: (test)