chrislit/abydos

Erroneous handling of two-character ipa symbols

Closed this issue · 1 comments

steps to reproduce:

>>> a  = "t͡ʃ"
>>> from abydos.distance import PhoneticEditDistance
>>> di = PhoneticEditDistance()
>>> di.dist(a, '')
1.0
>>> di.dist(a, 'UNK')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../abydos/distance/_phonetic_edit_distance.py", line 288, in dist
    return self.dist_abs(src, tar) / normalize_term
  File ".../abydos/distance/_phonetic_edit_distance.py", line 232, in dist_abs
    d_mat = self._alignment_matrix(src, tar, backtrace=False)
  File ".../abydos/distance/_phonetic_edit_distance.py", line 157, in _alignment_matrix
    if src[i] != tar[j]
IndexError: list index out of range

I believe that simply placing lengths memorization after ipa_to_features here would fix the bug, yet im not sure if the repo is still maintained so I'd better leave an issue here for others to be aware.

src_len = len(src)
tar_len = len(tar)
src_list = ipa_to_features(src)
tar_list = ipa_to_features(tar)

Ah sorry, the fix is already in master. Sadly, it's not in the pypi distribution