metagenlab/mummer2circos

There is an error when I use -gb option.

Closed this issue · 2 comments

Thank you for enabling me to use your fantastic tool.
However, there is an issue causing an error when I use the '-gb' option.
I've tried installing it with conda and singularity, but I keep encountering the same error, even I used the example dataset and command on the tutorial.
The error message is as follows.
The version is 1.4.2
Can you provide a solution?

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk -b VF.faa

Traceback (most recent call last):
File "/opt/conda/bin/mummer2circos", line 10, in
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/mummer2circos/init.py", line 68, in main
force_data_dir=args.force)
File "/opt/conda/lib/python3.7/site-packages/mummer2circos/mummer2circos.py", line 118, in init
minus_file, plus_file = self.gbk2circos_data(gbk2orf, minus_file=f"{self.circos_data_dir}/circos_orf_minus.txt", plus_file=f"{self.circos_data_dir}/circos_orf_plus.txt")
File "/opt/conda/lib/python3.7/site-packages/mummer2circos/mummer2circos.py", line 1184, in gbk2circos_data
start = str(feature.location.start + self.contigs_add[record.id][0])
KeyError: 'NZ_CP008828.1'

Hi @fsysy
The error is due to the fact that the record ids in the genbank file have version numbers (e.g 'NZ_CP008828.1') while the record ids in the fasta file don't (e.g 'NZ_CP008828' without the '.1'). To map records between the fasta and the gbk files, we need to use the same ids in both files.

You can fix this by adding version numbers to the fasta file, either manually or with sed:
sed -ri 's/^>(.*)/>\1.1/' genomes/NZ_CP008827.fna

Thank you for reporting this problem, I will fix the example dataset.

Great!

I was confused about whether I should match the VERSION or the LOCUS in the GenBank file.

Anyway, that issue has been resolved with your solution.

Thank you @tpillone