viig99/SymSpellCppPy

RuntimeError: The expression contained an invalid character range, such as [b-a] in most encodings

romikforest opened this issue · 6 comments

What an error could it be:

import SymSpellCppPy
symSpell = SymSpellCppPy.SymSpell()
symSpell.load_dictionary(corpus='frequency_dictionary_en_82_765.txt', term_index=0, count_index=1, separator=" ")

terms = symSpell.lookup_compound('Any text.')

RuntimeError: The expression contained an invalid character range, such as [b-a] in most encodings.

Python 3.9.7
SymSpellCppPy==0.0.14
MacOS 11.6

maybe the dictionary isn't loaded, can you check it's available at the path? or it could be with some invalid character in the dictionary, not really sure of what the issue could be at this point.

Hi. Dictionary is loaded. lookup and word count work. Only lookup_compound is not working.

xregex r(XL("['’\\w-\\[_\\]]+"));
, '.' isn't added to the word list, you can try without it, if you need the '.' in the compound list could you please further elaborate on the use case?

It can't parse absolutely any symbols

import SymSpellCppPy
symSpell = SymSpellCppPy.SymSpell()

symSpell.load_dictionary(corpus='frequency_dictionary_en_82_765.txt', term_index=0, count_index=1, separator=" ")
print(symSpell.word_count())

terms = symSpell.lookup_compound("the")

result = (x.term for x in terms)
print(*result)
python t2.py
82834
Traceback (most recent call last):
  File "/Users/romik/Work/swo/Release3/spellers/t2.py", line 7, in <module>
    terms = symSpell.lookup_compound("the")
RuntimeError: The expression contained an invalid character range, such as [b-a] in most encodings.

starting from the empty string. neither bytecode or anything

Should be a difference between compilers.

Had the same issue, it's because of the "-" in the sequence xregex r(XL("['’\\w-\\[_\\]]+"));
It thinks there should be a range as in [a-b] or [0-9] but the characters to the left and right are invalid. Removing the dash or properly escaping it should fix the problem.