polm/fugashi

Pickling error when multiprocessing

Closed this issue · 2 comments

When I tried to use fugashi for multiprocessing, I got the following error.

File "stringsource", line 2, in fugashi.fugashi.GenericTagger.reduce_cython
self.c_tagger cannot be converted to a Python object for pickling

polm commented

The Tagger object is a C++ managed object and thus can't be pickled - pickling only works by default for pure Python objects. It's possible for fugashi to add pickling support but because it's pretty easy to just recreate the Tagger I haven't done that before. If someone wants to submit a PR I would be open to it if it's not too complex.

Note that this has nothing to do with multiprocessing, and happens even in vanilla use cases.

If you want a work around for your specific use case, I recommend saving the Tagger args you're using (if any) and creating a tagger for each process.

The Tagger object is a C++ managed object and thus can't be pickled - pickling only works by default for pure Python objects. It's possible for fugashi to add pickling support but because it's pretty easy to just recreate the Tagger I haven't done that before. If someone wants to submit a PR I would be open to it if it's not too complex.

Note that this has nothing to do with multiprocessing, and happens even in vanilla use cases.

If you want a work around for your specific use case, I recommend saving the Tagger args you're using (if any) and creating a tagger for each process.

Thank you for your explanation and clarification. I've already used that workaround. It's OK to close this issue.