labgem/PPanGGOLiN

TypeError: invalid type (<class 'str'>) for column ``organism``

raysully opened this issue · 4 comments

Hi, I ran into an error while running ppanggolin 1.2.105 and I'm not sure what the problem is. Would appreciate your help:

(ppanggolin) gene@precision6:~/Micrococcus/ppanggolin$ ppanggolin workflow --anno Micrococcus-120.tsv -c 16
2023-12-22 11:11:41 utils.py:l116 INFO Command: /home/gene/miniconda3/envs/ppanggolin/bin/ppanggolin workflow --anno Micrococcus-120.tsv -c 16
2023-12-22 11:11:41 utils.py:l117 INFO PPanGGOLiN version: 1.2.105
2023-12-22 11:11:41 annotate.py:l448 INFO Reading Micrococcus-120.tsv the list of organism files ...
100%|███████████████████████████████████████| 120/120 [00:03<00:00, 37.70file/s]
2023-12-22 11:11:44 annotate.py:l472 INFO gene identifiers used in the provided annotation files were not unique, PPanGGOLiN will use self-generated identifiers.
2023-12-22 11:11:44 writeBinaries.py:l895 INFO Writing genome annotations...
7%|██▌ | 8/120 [00:00<00:00, 401.98genome/s]
Traceback (most recent call last):
File "tables/tableextension.pyx", line 1676, in tables.tableextension.Row.setitem
TypeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/gene/miniconda3/envs/ppanggolin/bin/ppanggolin", line 10, in
sys.exit(main())
File "/home/gene/miniconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/main.py", line 159, in main
ppanggolin.workflow.workflow.launch(args)
File "/home/gene/miniconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/workflow/workflow.py", line 38, in launch
write_pangenome(pangenome, filename, args.force, disable_bar=args.disable_prog_bar)
File "/home/gene/miniconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeBinaries.py", line 897, in write_pangenome
write_annotations(pangenome, h5f, disable_bar=disable_bar)
File "/home/gene/miniconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeBinaries.py", line 118, in write_annotations
gene_row["organism"] = org.name
File "tables/tableextension.pyx", line 1681, in tables.tableextension.Row.setitem
TypeError: invalid type (<class 'str'>) for column organism
/home/gene/miniconda3/envs/ppanggolin/lib/python3.10/site-packages/tables/file.py:113: UnclosedFileWarning:

Closing remaining open file: ppanggolin_output_DATE2023-12-22_HOUR11.11.41_PID3991613/pangenome.h5

Hi,

I think it is likely that a non expected character (likely non-ASCII) was used in the organism names that you indicated in the Micrococcus-120.tsv file, and that's likely why it crashed there.
It's likely the 8th name of the file considering where it crashed, though I'm not 100% sure.

If it does not help you solving your problem, would you mind sharing your Micrococcus-120.tsv file so I can take a look at it?

Adelme

Thanks for your quick reply, Adelme! I reviewed the file but can't see the error. Hopefully you can point it out.

I had to zip it. Github doesn't seem to support .tsv's
Micrococcus-120.zip

Hi,

I managed to reproduce the error using the names in your file. There are some hidden characters in the names that are likely from some Windows file writings that probably happened before moving the file to linux (it can be this file, as well as wherever the names come from).

I couldn't manage to pinpoint where the characters were, but if you run the following command in on your file before using ppanggolin:

dos2unix Micrococcus-120.tsv

Will remove those hidden characters, and your file should be processed normally!

Adelme

Yes, you nailed it, Adelme! It's weird because I do not own a windows/dos machine. Thanks so much!