Lemmatizing بخونه not giving the correct result
longjiang opened this issue · 1 comments
longjiang commented
The surface form "بخونه" ('read') should lemmatize to "خواند#خوان", but the lemmatizer is returning its original form "بخونه "
from hazm import Lemmatizer, Normalizer
normalizer = Normalizer()
normalized = normalizer.normalize('بخونه')
print(normalized)
# بخونه
lemmatizer = Lemmatizer()
lemmatized = lemmatizer.lemmatize('بخونه')
print(lemmatized)
#بخونه
sir-kokabi commented
سلام.
برای متنهای محاورهای باید از InformalLemmatizer استفاده کنید.
from hazm import InformalLemmatizer
lemmatizer = InformalLemmatizer()
lemmatized = lemmatizer.lemmatize('بخونه')
print(lemmatized)
خواند#خوان