SnowballStemmer: how to avoid transliteration?
satyrmipt opened this issue · 1 comments
satyrmipt commented
Please look at the code below. Is there a way to avoid transliteration of "sheet" substring to "шеет" one in the 2nd case?
Code:
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer(language='russian')
stemmer.stem(""), stemmer.stem("русский текст"), stemmer.stem("english text")
Output:
('', '<шеет>русский текст</шеет>', 'english text')
satyrmipt commented
Oh, i forget the formatting and my question is incorrect. Let's try again:
Code:
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer(language='russian')
stemmer.stem("<sheet>"), stemmer.stem("<sheet>русский текст</sheet>"), stemmer.stem("<sheet>english text</sheet>")
Output:
('<sheet>', '<шеет>русский текст</шеет>', '<sheet>english text</sheet>')
Question:
how to avoid "sheet" -> "шеет" transliteration?