vigna/Sux4J

Cannot retrieve provided arbitrary values when querying GOVMinimalPerfectHashFunction

Closed this issue · 2 comments

pkolb commented

When creating a GOVMinimalPerfectHashFunction from a ChunkedHashStore, there is no difference in using add(<T>) versus add(<T>, long) when building the ChunkedHashStore. In other words, I cannot retrieve the values that I provided when querying the GOVMinimalPerfectHashFunction with getLong(<T>).

ChunkedHashStore<CharSequence> chunkedHashStore = new ChunkedHashStore<>(
                TransformationStrategies.utf16());
chunkedHashStore.add("one", 1);
chunkedHashStore.add("ten", 10);
chunkedHashStore.add("four", 4);
chunkedHashStore.check();
GOVMinimalPerfectHashFunction<CharSequence> mph = 
                new GOVMinimalPerfectHashFunction.Builder<CharSequence>()
                        .store(chunkedHashStore)
                        .build();
System.out.println("one: "+mph.getLong("one"));
System.out.println("ten: "+mph.getLong("ten"));
System.out.println("four: "+mph.getLong("four"));

The above code prints

one: 0
ten: 2
four: 1

instead of the expected

one: 1
ten: 10
four: 4

The javadoc for GOVMinimalPerfectHashFunction states that

at construction time you can pass a ChunkedHashStore containing the keys (associated with any value)

Did I miss something?
(I'm using sux4j-4.0.0 with Java 8.)

vigna commented

Yes. If you want assign values, use a GOV3Function or a GOV4Function. A minimal perfect hash associates to each key a distinct index in the range from 0 to the number of keys, and you cannot choose the index for a specific key.

Indeed, you can pass a ChunkedHashStore containing any value: the values will be ignored (indeed, the documentation explains that in case the store needs to be rebuilt it will contain the ordinal position of each key).

pkolb commented

Ah, I see. Thanks for the quick answer!