After infra juggle, accents are not filtered off, tokenizer does not recognize simple words
rueter opened this issue · 0 comments
rueter commented
In the src/fst/morphology/stems/nouns.lexc
табак:таба́к м_b_Р2 "weight: 4.490091974131408" ;
BUT
lang-rus jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol
> табак
табак табак+? inf
This is related to several tickets in after the move