taishi-i/nagisa

Returning a generator instead of a list in nagisa.postagging

BLKSerene opened this issue · 3 comments

Hi, I'm trying to figure out how to POS-tag a list of tokens that have already been tokenized and I found #8 , which works fine.

And I think that returning a generator instead of a list would be better for users, since it will create a long list of POS tags in-memory for a large input text. And in most cases, the returned POS-tags are to be iterated over (usually only once) to be zipped with the tokens.

Or, you could provide two functions, like postagging and lpostagging, the former one returning a generator and the latter one returning a common list.

Hi BLKSerene,

Thank you for your advice. I'm trying to implement a generator for returning POS-tags a list of tokens that have already been tokenized. Please wait for a week to complete it.

I'm sorry I took the time to implement to fix this issue. I solved this problem to use @Property in tagger.py as like a generator.

nagisa/nagisa/tagger.py

Lines 245 to 249 in 07de25f

@property
def postags(self):
if self.__postags is None:
self.__postags = self.__postagging(self.words, self.__lower)
return self.__postags

By adopting this method, I changed to a specification that can not be put a list of POS tags into in-memory immediately. You can use this as nagisa.tagging() function in v0.2.3.

Thank you.

Thanks!