roshan-research/hazm

Lemmatizing بخونه not giving the correct result

longjiang opened this issue · 1 comments

The surface form "بخونه" ('read') should lemmatize to "خواند#خوان", but the lemmatizer is returning its original form "بخونه "

from hazm import Lemmatizer, Normalizer

normalizer = Normalizer()
normalized = normalizer.normalize('بخونه')
print(normalized)
# بخونه 

lemmatizer = Lemmatizer()
lemmatized = lemmatizer.lemmatize('بخونه')
print(lemmatized)
#بخونه

سلام.
برای متن‌های محاوره‌ای باید از InformalLemmatizer استفاده کنید.

from hazm import InformalLemmatizer

lemmatizer = InformalLemmatizer()
lemmatized = lemmatizer.lemmatize('بخونه')
print(lemmatized)
خواند#خوان