Optimizing TermGatherer process time
Closed this issue · 1 comments
TermGatherer is getting slower and slower due to increasing gathering complexity (compounds, derivates, prefixes, synonyms, etc) and is not well optimized.
Three optimization tasks considered:
1. Do not process twice the same pair of terms
Currently, due to several pair indexing keys (lemma-lemma and lemma-stem), and also to inter-classes overlaps, term pairs can be processed several times.
2. check if a term pair has valid variation rule by pattern
Pattern constraint satisfactions are very fast to test compared to the groovy rule part of each variant rule. Try to accept/reject all term pairs by pattern constraint before testing the groovy rule part.
3. Intelligent term iteration within class
When a term class is still too big, apply intelligent term iteration instead of filtering by th directly)
Term gathering completely refactored and externalized from its UIMA wrapper