termsuite/termsuite-core

Post-processing : promote frequent variants as terms

Closed this issue · 2 comments

dcram commented

Reduce the number of variants for terms to max 10-15 variants by promoting frequent variants as terms.

dcram commented

Also in german, there are numerous size-3 words in top-10. Investigate why.

dcram commented

Post processing has been refactored, but instead of promoting frequent variants as terms, its 2-order variants are not displayed in variant bag, and a [+] label is appended to the V label in TSV so as to indicate it has variants.

T	nn: wind turbine
V[S][+]	nnn: horizontal-axis wind turbine
V[S]	nnn: wind turbine rotor
V[S]	nnn: wind turbine application
V[S]	ann: offshore wind turbine
V[S]	nnn: wind turbine sound
V[S][+]	nnn: wind turbine blade
V[S]	nnn: dfig wind turbine
V[S]	annn: variable speed wind turbine
V[S][+]	nnn: wind turbine noise
V[S][+]	nnn: wind turbine concept