metagenome-atlas/atlas

How to rerun after changing parameters

mdehollander opened this issue · 3 comments

How can you rerun atlas, for example after changing the assembly parameters in the config.yaml?

I tried with the list-params-changes flag from snakemake as suggested in the Snakemake FAQ, but that does not work for me:

$ atlas run assembly --list-params-changes
[Atlas] INFO: Atlas version: 2.11.0
Config file /****/.conda/envs/atlasenv/lib/python3.10/site-packages/atlas/workflow/config/default_config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
WorkflowError in line 358 of *****/.conda/envs/atlasenv/lib/python3.10/site-packages/atlas/workflow/Snakefile:
Resource _nodes is of type int but global resource constraint defines <class 'str'> with value 80. Resources with the same name need to have the same types (int, float, or str are allowed).
[Atlas] CRITICAL: Command 'snakemake --snakefile *****/.conda/envs/atlasenv/lib/python3.10/site-packages/atlas/workflow/Snakefile --directory *****/analysis/*****/atlas --jobs 80 --rerun-incomplete --configfile '*****/analysis/*****/atlas/config.yaml' --nolock   --use-conda --conda-prefix *****/analysis/*****/atlas/databases/conda_envs    --resources mem=717 mem_mb=734995 java_mem=610   --scheduler greedy  assembly  --list-params-changes ' returned non-zero exit status 1.

Using snakemake --delete-all-output works, but removes all files (like the QC fastqs), not only does affected by the params change. Any suggestions how to easily do this? (I managed by bash command with for, find and rm, but that is not so convenient ;)

Sorry for the delay.

In theory, you should be able to list the rules and rerun them. I don't completely understand what the snakemake error is.
I could help would make this work.

For example try:

snakemake --snakefile *****/.conda/envs/atlasenv/lib/python3.10/site-packages/atlas/workflow/Snakefile --directory *****/analysis/*****/atlas --configfile '*****/analysis/*****/atlas/config.yaml' assembly --list-params-changes

without the jobs argument.

However, I fear that if something goes wrong during the rerun, you don't know which samples have the new assembly and which are the old ones.
Therefore, I prefer to delete the key output files and rerun the assembly.

From inside your working directory cou can.

rm -r */assembly
rm */*_contigs.fasta

and then rerun atlas.

Alternatively, you could also restart a new atlas project. Copy config.yaml file and samples.tsv. Specify the new assembler in the copy of the config yaml.
In the sample table remove the columns for the raw reads. and change the path of the qc reads to their absolute path. In this way you could compare the output of both assemblers.

No worries. Thanks for your reply.

The jobs argument is indeed the problem. When I replace --jobs 80 with -c 80 together with all original arguments, it does list the files has to be rerun because of the config change.

Maybe related to this bug report in snakemake?: snakemake/snakemake#1589 The use of --jobs and -c has changed last year or so in snakemake if I remember correctly. I am not using a scheduler or so, the jobs are local. And I am using snakemake 7.3.8, which comes with my atlas installation.

I know that there is this ambiguity between cores and jobs, but I don't know if there would be a simple solution.