CAMI-challenge/CAMISIM

metagenomesimulation.py raising "ERROR: [MetagenomeSimulationPipeline] 'gb|*********.1|:1-****' in line 117" Error

cberta11 opened this issue · 3 comments

Good afternoon,
After successfully running metagenomesimulation.py on the provided sample dataset in defaults/mini_config.ini, I am having issues with running the script on my own genomes. Running metagenomesimulation.py with default/mini_config.ini containing paths to my own genomes returns:
image
while running metagenomesimulation.py with defaults/default_config.ini, with similar parameters, likewise returns
image
I am unsure what is causing the error, and as I said running the script on mini_config with the example data set you provided ran without issue. Any help would be greatly appreciated.
Joe

I provided the following below for your convenience:
1.) metagenomesimulation.py defaults/mini_config.ini --debug print out
2.) defaults/mini_config.ini used
3.) metagenomesimulation.py defaults/default_config.ini --debug print out
4.) defaults/default_config.ini used
5.) mamba environment packages
6.) metadata.tsv used
7.) genome_to_id.tsv used

1.) metagenomesimulation.py defaults/mini_config.ini --debug print out
image

2.) defaults/mini_config.ini used
image

3.) metagenomesimulation.py defaults/default_config.ini --debug print out
image

4.) defaults/default_config.ini used
image

5.) packages list for the mamba environment used
image

6.) metadata.tsv used
image

7.) genome_to_id.tsv used
image

I also thought to try setting anonymous reads to false since the issue seems to come from that portion of the metagenomesimulation.py script:
image

However, this just threw the flag:
image

So it seems to throw the error with whatever comes after the assembly stage?

Unfortunately, CAMISIM has some problems with special characters in sequence names - I assume the obscure error message you receive (gb|OL88421.1|:1-1451_2) is a sequence name in the Acidovorax caeni genome. I thought I had fixed that error in the latest version, but it still seems to pop up from time to time. It should work if you remove all - (and maybe also _ just to be sure) from the sequence names within the fasta file(s).

Thank you so much for your help! Removing the dashs and underscores has fixed the issue.