trie using integer keys?
greg7mdp opened this issue · 12 comments
Hi there, just a question, do you have a trie storing integer keys?
Thanks, - greg
If the key's somehow have some coherence (could be ip's (sub-nets)) then you could break down the int in 4 chars and use that as a key (i.e. convert the int notionally to string, in C++17 you can just (after resizing) the int with std::memcpy
into a std::string
(legalized type-punning)). The memcpy will be optimized away and the binary code will do what you would like it to do.
Thank you very much, @degski ! I just realized this hat-trie is not ordered which I need.
Hi,
The current HAT-Trie, in addition of not being ordered, may also not be the most efficient for storing integers, even when split by byte, it's more specialized for strings. I know there is the Y-fast trie but I never tested any implementation. There is also nedtries for integers but I never tested it either.
Thibaut
Thank you very much, @degski ! I just realized this hat-trie is not ordered which I need.
I suspected that, that's what I meant with 'coherence'. A trie (in its simplest form, not this one) stores words in trees with the letters (char's) as ever deepening keys of the tree's nodes, so all the words with a 2nd letter 'b' f.e. are in a sub-tree with root key 'b', and so on.
Just use a hash tree, if the keys are 'random' enough, particularly the low bits, you might get away with just using the identity function for hashing into the hash map, which would make it very cheap.
thanks @degski, would a hash tree preserve ordering? Do you have an example of this?
Even when using the identity function for hash, the modulo would mess up the ordering right?
Thanks - greg
Thank you very much, @degski ! I just realized this hat-trie is not ordered which I need.
It's ordered, but lexicographically, that's why it applies to strings.
thanks @degski, would a hash tree preserve ordering? Do you have an example of this?
What order do you want to preserve, you did not define that?
Even when using the identity function for hash, the modulo would mess up the ordering right?
No, that's what the hash tree will do as well, as it grows it will 'consume' more bits of your key. It is the low bits that matter the most. If for instance you have many instances of the int with the same low byte than you will create an excessive number of collisions on that slot, which means it will need to seek a free alternative location and do the bookkeeping of that, so that's relatively expensive and should be avoided. In such a case you are better of, using a std::set
or a std::map
. For smallish size a std::map
beats a hash map any day of the week, so not all is lost.
I want to use integer keys (for example uint32_t). I need to be able to traverse the tree in sorted fashion (< for uint32_t), and to query the min and max integer keys at any time. It should also be conservative with memory usage.
I want to use integer keys (for example uint32_t). I need to be able to traverse the tree in sorted fashion (< for uint32_t), and to query the min and max integer keys at any time.
std::map
.
or better, create an intrusive map with std::set
, i.e. the 'key' consists of a k-v-pair.
yes, I know. Actually cpp-btree or abseil btree are better. But I was looking for even faster lookups.
or better, create an intrusive map with std::set, i.e. the 'key' consists of a k-v-pair
isn't that what std::map does internally?
or better, create an intrusive map with std::set, i.e. the 'key' consists of a k-v-pair
isn't that what std::map does internally?
The answer is probably yes, but you'll be depending on implementation. This is no problem, it's totally valid to write code that only runs on Windows, i.e. to depend on the implementation or OS extensions, it all depends upon what you want.