help overwriting SLURM default params
bio-la opened this issue · 5 comments
Hi all, thanks for supporting this pipeline language!
i am attempting to customize my pipeline to run on a slurm cluster.
in this issue #132 (comment)_ @sebastian-luna-valero pointed out how to print the default configurations
python pipeline.py printconfig
may help troubleshoot the issue. Specifically there should be a section: List of .yml files used to configure the pipeline
in the output with the list of yaml files being picked up and their associated priority to load them up (higher priority overrides lower ones)
Originally posted by @sebastian-luna-valero in #132 (comment)
my .cgat.yml is minimal:
cluster:
options: --partition=cpu_p --qos=icb
queue_manager: "slurm"
when I call the pipeline, it fails with "Invalid partition name specified " because the Job has the following parameters:
# 2022-08-22 16:52:17,289 INFO job-options: -J Sample2_scrublet_scores.txt --partition=cpu_p --qos=icb --cpus-per-task=2 --partition=all.q
this is what python pipeline.py printconfig
returns for the cluster param.
cluster = {'queue': 'all.q', 'priority': -10, 'num_jobs': 100, 'memory_resource': 'mem_free', 'memory_default': 'unlimited', 'memory_ulimit': False, 'options': '--partition=cpu_p --qos=icb', 'parallel_environment': 'dedicated', 'queue_manager': 'sge', 'tmpdir': False}
cluster_memory_default = 4G
cluster_memory_resource = mem_free
cluster_memory_ulimit = False
cluster_num_jobs = 100
cluster_options = --partition=cpu_p --qos=icb
cluster_parallel_environment = dedicated
cluster_priority = -10
cluster_queue = all.q
cluster_queue_manager = slurm
- the printed params seem to suggest it defaults to sge (even though it will pick up cluster_queue_manager slurm afterwards)
- The --partition I specify as additional option param is overwritten by the "queue" param that I do not specify in the config, so I sorted this by instead adding it as the queue param
cluster:
options: --qos=icb
queue_manager: "slurm"
queue: cpu_p
I am just not sure what hierarchical code structure specifies default params. I assumed it should have picked up my config and overwrite everything else but what I see is a mix of my param and some defaults. Just thought of posting in case it helps someone else!
thank you!
Hey Fabiola,
Are you using the latest cgat-core? I made a pull request here because I realised the pipeline wasn't outputting the updated parameters, only the default sge ones (#158). This should solve point 1.
To me the problem seems to be the name of the partition is being called twice as you are adding it as cluster_queue: all.q (The default) and cluster_options: --partition=cpu_p. I think the scheduler is likely getting confused.
Your fix should have solved it:
cluster:
options: --qos=icb
queue_manager: "slurm"
queue: cpu_p
Do you still get problems?
thanks adam for looking into this!
as you pointed out my "corrected" config file fixes the issues, but even after installing the latest cgat-core ( cgatcore==0.6.14?) the cluster params still say it's "sge", although this must be overwritten in action cause the interaction with the scheduler is fine.
cluster = {'queue': 'cpu_p', 'priority': -10, 'num_jobs': 100, 'memory_resource': 'mem_free', 'memory_default': 'unlimited', 'memory_ulimit': False, 'options': '--qos=icb', 'parallel_environment': 'dedicated', 'queue_manager': 'sge', 'tmpdir': False}
cluster_memory_default = 4G
cluster_memory_resource = mem_free
cluster_memory_ulimit = False
cluster_num_jobs = 100
cluster_options = --qos=icb
cluster_parallel_environment = dedicated
cluster_priority = -10
cluster_queue = cpu_p
cluster_queue_manager = slurm
Could you include the whole output of pipeline.log?
python ~/devel/github/pipeline_manuscript/sc_pipelines_muon_dev/panpipes/pipeline_qc_mm.py printconfig > configuration.yml
Ah this is expected behaviour, maybe I should add a warning that the first output of the pipeline is the default settings (To help debug), when you run the pipeline, you should see that it outputs configuration variables twice, once with the default config and the second with the updated config values. When I have time I will add a note clarifying this.