leylabmpi/Struo2

missing clusters_db.10 and 11

fanhuan opened this issue · 4 comments

This is referring to:
bin/db_create/genes/Snakefile line 259:X = range(12))),

I was testing with only three genomes, and I got an error complaining about missing files clusters_db.10 and clusters_db.11. I looked into the Snakefile and found it was because of the line mentioned above. Does it have to be 12? Or I could just modify it (to 10 in my case). I was just wondering how 12 was chosen.

Best,
Huan

Same with line 305, 345 and 410 in the same file (bin/db_create/genes/Snakefile).

I'm guessing that you are running the pipeline with <12 threads. The problem is due to mmseqs, which generates different numbers of output files depending on the threads used. So, if you use < 12 threads, mmseqs generates < 12 output files, but snakemake still looks for all 12 output files.
An HPC should be used for Struo2, given the amount of resources required. If you are using an HPC, >=12 CPUs per job should be possible.
I can try to make the pipeline more dynamic via checkpointing, but that will take some time to test it.

Nick you are absolutely right. I was running with -j 10. Sorry for my lack of understanding of how mmseqs works. I consider this issue closed.

I'll leave it open, since I can hopefully make the pipeline handle any number of threads used, but it will take some reconfiguration