termsuite/termsuite-core

Optimizing TermGatherer process time

Closed this issue · 1 comments

dcram commented

TermGatherer is getting slower and slower due to increasing gathering complexity (compounds, derivates, prefixes, synonyms, etc) and is not well optimized.

Three optimization tasks considered:

1. Do not process twice the same pair of terms

Currently, due to several pair indexing keys (lemma-lemma and lemma-stem), and also to inter-classes overlaps, term pairs can be processed several times.

2. check if a term pair has valid variation rule by pattern

Pattern constraint satisfactions are very fast to test compared to the groovy rule part of each variant rule. Try to accept/reject all term pairs by pattern constraint before testing the groovy rule part.

3. Intelligent term iteration within class

When a term class is still too big, apply intelligent term iteration instead of filtering by th directly)

dcram commented

Term gathering completely refactored and externalized from its UIMA wrapper