jeetsukumaran/DendroPy

Multithreaded option fails with no error message - leaf labels issue?

AthinaGav opened this issue · 2 comments

I was using sumtrees.py with the multithreaded option (-m) but only an empty .sumtrees file was produced and the analysis appeared to be running forever but without using any resources. The only error message, which appeared among the normal messages of processes starting and did not stop the analysis, was this:

_TypeError: cannot pickle '_io.TextIOWrapper' object
Traceback (most recent call last):
File ".../anaconda3/envs/dendropy/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
obj = ForkingPickler.dumps(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../anaconda3/envs/dendropy/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)

After a lot of trial and error, since for a while I though there was some python version conflict - which was weird because a while ago everything worked fine - I realized what the problem was. The labels in my input trees included two which the tool saw as duplicates: StRir_1 and StRiR_1. The only difference between them is a small or capital letter.
I did not expect that, and it was really difficult to figure out since the error message seemed unrelated. After renaming one of the labels everything ran smoothly.
I am only reporting this here, in case someone else comes across this issue. In addition, I would suggest that a more informative error message or perhaps an additional check for the multithreaded version.

Thanks again to the devs for this nice tool.

Hi @AthinaGav --- would it be possible to send an example script/data where this occurs so I can try to reproduce it on our end?

Closing this for now, feel free to reopen if this is still relevant. 👍