caleb531/automata

Aho-Corasick Algorithm

Closed this issue ยท 9 comments

See title, implement the Aho-Corasick algorithm for construction of a DFA which recognizes all strings which contain a substring from a given dictionary. This is an extension of the existing from_substring method that uses the KMP construction algorithm (and can probably replace that method if implemented, as the existing method is just a special case of the new one).

References:

cc @Tagl as you originally implemented the KMP algorithm. I'm not sure if there's a simple way to extend the existing implementation to the more general AC algorithm.

Tagl commented

I have a C++ implementation that has been fieldtested through programming contests, I can port and adapt to the library. Currently at EGOI but I promise to look into implementing when I'm back sometime soon after the 24th of July.

That would be awesome, thanks! Don't worry about the timing, with the v8 release I'm sure we'll have a few post-release items that will make it into a minor version.

@Tagl have you had a chance to take a crack at this yet? No worries if not, just wanted to check in ๐Ÿ˜ƒ

Tagl commented

I've reviewed the old implementation and refreshed my memory on how the algorithm works but haven't started writing anything so far.

Tagl commented

I'm planning on doing it this coming weekend.

Tagl commented

cc @Tagl as you originally implemented the KMP algorithm. I'm not sure if there's a simple way to extend the existing implementation to the more general AC algorithm.

Regarding this, I will probably start with a fresh rewrite, then try to adapt it to the old interface.

Done writing it up, seems to work, just some cleanup and tests left before making a PR

Sidenote: we should extend it for containing one of a set of subsequences using NFAs. It's a rather simple extension after all.

Closing since the PR was merged to develop.