abusix/ahocorapy

Question: what does the finalize method does?

amirouche opened this issue · 4 comments

Hello, I am looking again at ahocorapy. I still struggle to understand what is the point of the finalize method? I understand it helps with performance. But I am not sure how.

I think I get it. If the search is initialized with ['c', 'bc', 'abc'], then given the string 'xxxxabc' before the call to finalize, it will only match abc.

Is that it ?

I am not sure what problem it try to solve.

https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

The algorithm in its original form requires a finalization step, once all keywords have been added.

We might consider using an incremental version mentioned in this paragraph:

The original Aho-Corasick algorithm assumes that the set of search strings is fixed. It does not directly apply to applications in which new search strings are added during application of the algorithm. An example is an interactive indexing program, in which the user goes through the text and highlights new words or phrases to index as he or she sees them. Bertrand Meyer introduced an incremental version of the algorithm in which the search string set can be incrementally extended during the search, retaining the algorithmic complexity of the original.

I will close this, as it's not really in issue IMO