TypeError: '<' not supported between instances of 'Word' and 'Word'
shantanuo opened this issue · 8 comments
It works for some words but getting error in case of others.
from spylls.hunspell import Dictionary
dictionary = Dictionary.from_files('/root/marathi/dicts/mr_IN')
for suggestion in dictionary.suggest('मान्वी'):
print(suggestion)
मानवी
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-cec47a2e0f5b> in <module>
3 dictionary = Dictionary.from_files('/root/marathi/dicts/mr_IN')
4
----> 5 for suggestion in dictionary.suggest('मान्वी'):
6 print(suggestion)
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/dictionary.py in suggest(self, word)
201 """
202
--> 203 yield from self.suggester(word)
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/algo/suggest.py in __call__(self, word)
181 word: Word to check
182 """
--> 183 yield from (suggestion.text for suggestion in self.suggest_internal(word))
184
185 def suggest_internal(self, word: str) -> Iterator[Suggestion]: # pylint: disable=too-many-statements
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/algo/suggest.py in <genexpr>(.0)
181 word: Word to check
182 """
--> 183 yield from (suggestion.text for suggestion in self.suggest_internal(word))
184
185 def suggest_internal(self, word: str) -> Iterator[Suggestion]: # pylint: disable=too-many-statements
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/algo/suggest.py in suggest_internal(self, word)
345
346 ngrams_seen = 0
--> 347 for sug in self.ngram_suggestions(word, handled=handled):
348 for res in handle_found(Suggestion(sug, 'ngram'), check_inclusion=True):
349 ngrams_seen += 1
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/algo/suggest.py in ngram_suggestions(self, word, handled)
508 known={*(word.lower() for word in handled)},
509 maxdiff=self.aff.MAXDIFF,
--> 510 onlymaxdiff=self.aff.ONLYMAXDIFF)
511
512 def phonet_suggestions(self, word: str) -> Iterator[str]:
/root/miniforge3/lib/python3.7/site-packages/spylls/hunspell/algo/ngram_suggest.py in ngram_suggest(misspelling, dictionary_words, prefixes, suffixes, known, maxdiff, onlymaxdiff)
81 heapq.heappushpop(root_scores, (score, word.stem, word))
82 else:
---> 83 heapq.heappush(root_scores, (score, word.stem, word))
84
85 roots = heapq.nlargest(MAX_ROOTS, root_scores)
TypeError: '<' not supported between instances of 'Word' and 'Word'
Oh, that's an interesting one. Can imagine how it happened, though. Can you please show me the dictionaries you are using, for me to test it easier?
Here is how to repeat:
!wget -N https://kagapa.s3.ap-south-1.amazonaws.com/with_acor_N.oxt
!unzip -o ./with_acor_N.oxt
from spylls.hunspell import Dictionary
dictionary = Dictionary.from_files('./dicts/mr_IN')
#####
# Returns True
print(dictionary.lookup('मानवी'))
# This should not return anything because the word is correct
for suggestion in dictionary.suggest('मानवी'):
print(suggestion)
#####
# Returns False
print(dictionary.lookup('मान्वी'))
# Should return suggestions, but getting an error
for suggestion in dictionary.suggest('मान्वी'):
print(suggestion)
It works as expected using hunspell module.
#sudo apt-get install -y libhunspell-dev
#pip install python-dev
#pip install hunspell
import hunspell
spellchecker = hunspell.HunSpell(
"./dicts/mr_IN.dic",
"./dicts/mr_IN.aff",
)
spellchecker.spell('मानवी')
for suggestion in spellchecker.suggest('मानवी'):
print(suggestion)
spellchecker.spell('मान्वी')
for suggestion in spellchecker.suggest('मान्वी'):
print(suggestion)
What is the advantage of using spylls over hunspell?
Thanks for the details, I'll look into it!
What is the advantage of using spylls over hunspell?
If you just need to check spelling, I believe there is not much: maybe the fact that spylls
is pure Python and therefore can be installed where hunspell couldn't (some CI?), and can be hackable (looking into dictionary contents, into settings, etc.).
The goal of the project is to be readable and hackable, while (hopefully) repeating all hunspell's behavior.
Yes. I can see where it can be useful. For e.g. someone can resolve this bug...
If I can nest more than 2 levels of affix rules, it will be helpful.
Type error is fixed in master
, thanks for noticing!
for suggestion in dictionary.suggest('मान्वी'):
print(suggestion)
# Now prints:
# मानवी
# मानावी
As for whether the suggestions should be printed for the already correct word, I prefer to keep it simple. It is just as easy for client code to check "whether it is correct", and printing suggestions for correct word might be considered a useful functionality, too (print words similar to this one).
Hunspell module returns these 4 suggestions. While spylls return only 2
मानावी
मान्यवर
मान्यही
मानव्य
One word "मानावी" is common in both. The word returned by spylls "मानवी" is not there in hunspell.
Can you guess the reason?
मानावी
मानवी
As a matter a fact, the word that is there in spylls and not in hunspell 'मानवी' is the correct expected word! I will like to know how this has been achieved.
I will like to know how this has been achieved.
That's an interesting one :) Most of the algorithms in the original Hunspell work well and tested with 1- or 2-byte characters. As Marathi chars are 3-byte, some of the Hunspell's internals fallback to "default" (almost "random") mode, including n-gram-based suggestion (word distance similarity). Due to Python's excellent Unicode support, spylls don't have this limitation. So, the algorithms are the same, they are just working more correctly with 3-byte chars.