giellalt/lang-rus

After infra juggle, accents are not filtered off, tokenizer does not recognize simple words

rueter opened this issue · 0 comments

In the src/fst/morphology/stems/nouns.lexc

табак:таба́к м_b_Р2 "weight: 4.490091974131408" ;

BUT

lang-rus jackrueter$ hfst-lookup src/fst/analyser-gt-norm.hfstol 
> табак
табак	табак+?	inf

This is related to several tickets in after the move