WorksApplications/SudachiPy

AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'

hiroshi-matsuda-rit opened this issue · 7 comments

This error might be related to the cythonization. @polm @sorami
Do you have the test cases for this API?

  File "/mnt/c/git/spaCy/venv.wsl/lib/python3.8/site-packages/sudachipy/morpheme.py", line 56, in split
    return self.list.split(mode, self.index, wi)
  File "/mnt/c/git/spaCy/venv.wsl/lib/python3.8/site-packages/sudachipy/morphemelist.py", line 75, in split
    n.begin = offset
AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'

I had similar cases while investigating #128.

Sorry, no, there were no test cases for this method.

I think the error is because, with Cythonization, you don't have direct access to attributes, i.e., it should be n.set_begin() instead (this method already exists).

There may be more such cases, which the current test cases didn't catch.

from sudachipy import tokenizer
from sudachipy import dictionary

tokenizer_obj = dictionary.Dictionary().create()

mode = tokenizer.Tokenizer.SplitMode.C
morpheme = tokenizer_obj.tokenize("国家公務員", mode)[0]
morpheme.surface() # '国家公務員'

morpheme.split(tokenizer.Tokenizer.SplitMode.A)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-af36be3916ed> in <module>
----> 1 morpheme.split(tokenizer.Tokenizer.SplitMode.A)

SudachiPy/sudachipy/morpheme.py in split(self, mode)
     54     def split(self, mode):
     55         wi = self.get_word_info()
---> 56         return self.list.split(mode, self.index, wi)
     57
     58     def is_oov(self):

SudachiPy/sudachipy/morphemelist.py in split(self, mode, index, wi)
     73         for wid in word_ids:
     74             n = latticenode.LatticeNode(self.lexicon, 0, 0, 0, wid)
---> 75             n.begin = offset
     76             offset += n.get_word_info().head_word_length
     77             n.end = offset

AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'begin'

I have fixed the case, and added a test for this method in #134.

I am now looking at other parts of code that the Cythonization may affect (i.e., related to Lattice and LatticeNode) which we missed due to lack of test.

Memo about splitting in A or B mode;

When using Tokenizer to split text, the splitting from C mode to A/B mode is done by the method Tokenizer._split_path().

However, there are separate methods Morpheme.split() and MorphemeList.split() which is independent from the above Tokenizer method.

And there were no test cases for the latter, therefore this issue was not discovered until now.

polm commented

Sorry I missed this issue too... I thought I check the Cythonized attributes during development but obviously I missed some. I'll take a look and see what else I missed.

I have released yet another version v0.4.9 to fix this issue.

Thank you so mcuh! @sorami and @polm