carrotsearch/hppc

Implement WormSet [HPPC-188]

Closed this issue · 4 comments

WormMap was introduced in #218.

This Jira issue is to not forget to add WormSet (derived from WormMap code).

Also remind that currently the benchmark for WORM is not accurate for Set (it is internally creating a HashSet implementation). It is accurate for Map.


Issue: HPPC-188 (migrated from JIRA), created by Bruno Roustant (@bruno-roustant), resolved Jul 31 2023

Comment by Bruno Roustant (@bruno-roustant) (migrated from JIRA)

Of course I'm volunteer to provide a PR soon.

Comment by Dawid Weiss (@dweiss) (migrated from JIRA)

+1 :)

Comment by Bruno Roustant (@bruno-roustant) (migrated from JIRA)

PR added: #11

I ran the benchmarks. They confirm WormSet should be used when there are less than 2M elements, above the perf gain on contains() diminishes while the loss on add() increases.

At the price of a slower add(), perf -30% on average, then contains() is faster and stable with the load, perf +20% on average. Also, IntWormSet uses on average 10% more memory (25% more but the internal array enlarges later).

 

Benchmark for contains():

 

B006_HashSet_Contains_Random.contains

Internal array capacity: 2^20

HPPC:
Load   Score
0.45   0.569   ± 0.144 s/op
0.5    0.568   ± 0.127 s/op
0.55   0.533   ± 0.024 s/op
0.6    0.554   ± 0.009 s/op
0.65   0.635   ± 0.071 s/op
0.7    0.637   ± 0.041 s/op
0.74   0.629   ± 0.058 s/op

WORM:
Load   Score
0.45   0.438   ± 0.063 s/op
0.5    0.493   ± 0.228 s/op
0.55   0.458   ± 0.037 s/op
0.6    0.493   ± 0.186 s/op
0.65   0.460   ± 0.082 s/op
0.7    0.472   ± 0.093 s/op
0.74   0.486   ± 0.052 s/op

FASTUTIL:
Load   Score
0.45   0.482   ± 0.136 s/op
0.5    0.484   ± 0.060 s/op
0.55   0.508   ± 0.039 s/op
0.6    0.667   ± 0.274 s/op
0.65   0.603   ± 0.174 s/op
0.7    0.633   ± 0.078 s/op
0.74   0.652   ± 0.133 s/op

KOLOBOKE:
Load   Score
0.45   0.461   ± 0.043 s/op
0.5    0.517   ± 0.132 s/op
0.55   0.520   ± 0.056 s/op
0.6    0.542   ± 0.042 s/op
0.65   0.614   ± 0.075 s/op
0.7    0.647   ± 0.115 s/op
0.74   0.671   ± 0.133 s/op

 

 

Benchmark for add():

 

B005_HashSet_Add_Random.add

HPPC:
MB_of_keys  Score
1           0.022 ± 0.006 s/op
2           0.049 ± 0.003 s/op
3           0.080 ± 0.003 s/op
4           0.125 ± 0.006 s/op

WORM:
MB_of_keys  Score
1           0.029 ± 0.002 s/op
2           0.068 ± 0.004 s/op
3           0.106 ± 0.006 s/op
4           0.184 ± 0.007 s/op

FASTUTIL:
MB_of_keys  Score
1           0.022 ± 0.001 s/op
2           0.051 ± 0.007 s/op
3           0.079 ± 0.006 s/op
4           0.125 ± 0.004 s/op

KOLOBOKE:
MB_of_keys  Score
1           0.024 ± 0.002 s/op
2           0.053 ± 0.004 s/op
3           0.084 ± 0.003 s/op
4           0.139 ± 0.004 s/op

``

 

Comment by Bruno Roustant (@bruno-roustant) (migrated from JIRA)

@dweiss could you close this issue?