Tessil/hat-trie

[feature request] find longest prefix

Dobatymo opened this issue · 7 comments

I am missing a common feature of Trie APIs: searching for the longest string in the trie which is a prefix of input.

Example in pseudo code:

hattriemap = {"a", "as", "asdf"}
hattriemap.longest_prefix("asd") -> "as"

Is this possible with hat-tries? If yes, it would be a great addition to your already great library.

Hello,

It would effectively be a nice addition and could be done with a HAT-trie. I will see to implement it but it may take a bit of time.

Hello,

I added a branch with the new method. I still have to add some tests before merging the branch to the master.

I corrected a bug in the implementation and merged the change to the master.

Warn me if there is any problem.

Thank you a lot for implementing this!
But I have a small question. If the iterator returns more than one result, what is the meaning of these other values?
For example in my initial example the iterator returns: ["as", "asdf"]

I'm confused, the method only returns one value (an iterator), not a range of values (two iterators). It works the same way as find, the iterator points to the matching value, but if you increment it you'll have other non-matching values.

tsl::htrie_map<char> map = {{"a", 1}, {"as", 1}, {"asdf", 1}};

auto it = map.longest_prefix("asd");
if(it != map.end()) {
    // "as"
    std::cout << it.key() << std::endl;
}

Sorry for the syntax. I meant if I go through the iterator it will return "as" first, then "asdf" and then end(). So basically I should just ignore all except the first result as they don't have any meaning? (like 2nd longest, 3rd longest, which is not the case here, but as an example)
In that case, it works perfectly and I didn't see any problems so far.

Yes, you're not supposed to increment the iterator.

The API is based on the design of the STL. For example the method find of std::unordered_map will return an iterator to the element if it's found, otherwise end(). If you increment the returned iterator until end(), you'll get values that don't match the value passed to find.