Erroneous handling of two-character ipa symbols
Closed this issue · 1 comments
oserikov commented
steps to reproduce:
>>> a = "t͡ʃ"
>>> from abydos.distance import PhoneticEditDistance
>>> di = PhoneticEditDistance()
>>> di.dist(a, '')
1.0
>>> di.dist(a, 'UNK')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../abydos/distance/_phonetic_edit_distance.py", line 288, in dist
return self.dist_abs(src, tar) / normalize_term
File ".../abydos/distance/_phonetic_edit_distance.py", line 232, in dist_abs
d_mat = self._alignment_matrix(src, tar, backtrace=False)
File ".../abydos/distance/_phonetic_edit_distance.py", line 157, in _alignment_matrix
if src[i] != tar[j]
IndexError: list index out of range
I believe that simply placing lengths memorization after ipa_to_features
here would fix the bug, yet im not sure if the repo is still maintained so I'd better leave an issue here for others to be aware.
abydos/abydos/distance/_phonetic_edit_distance.py
Lines 144 to 148 in 344346a
oserikov commented
Ah sorry, the fix is already in master. Sadly, it's not in the pypi distribution