cgat-developers/cgat-core

help overwriting SLURM default params

Closed this issue · 5 comments

Hi all, thanks for supporting this pipeline language!
i am attempting to customize my pipeline to run on a slurm cluster.

in this issue #132 (comment)_ @sebastian-luna-valero pointed out how to print the default configurations

python pipeline.py printconfig may help troubleshoot the issue. Specifically there should be a section: List of .yml files used to configure the pipeline in the output with the list of yaml files being picked up and their associated priority to load them up (higher priority overrides lower ones)

Originally posted by @sebastian-luna-valero in #132 (comment)

my .cgat.yml is minimal:

cluster:
    options: --partition=cpu_p --qos=icb
    queue_manager: "slurm"

when I call the pipeline, it fails with "Invalid partition name specified " because the Job has the following parameters:

# 2022-08-22 16:52:17,289 INFO job-options: -J Sample2_scrublet_scores.txt --partition=cpu_p --qos=icb --cpus-per-task=2 --partition=all.q

this is what python pipeline.py printconfig returns for the cluster param.

cluster = {'queue': 'all.q', 'priority': -10, 'num_jobs': 100, 'memory_resource': 'mem_free', 'memory_default': 'unlimited', 'memory_ulimit': False, 'options': '--partition=cpu_p --qos=icb', 'parallel_environment': 'dedicated', 'queue_manager': 'sge', 'tmpdir': False}
cluster_memory_default = 4G
cluster_memory_resource = mem_free
cluster_memory_ulimit = False
cluster_num_jobs = 100
cluster_options = --partition=cpu_p --qos=icb
cluster_parallel_environment = dedicated
cluster_priority = -10
cluster_queue = all.q
cluster_queue_manager = slurm
  1. the printed params seem to suggest it defaults to sge (even though it will pick up cluster_queue_manager slurm afterwards)
  2. The --partition I specify as additional option param is overwritten by the "queue" param that I do not specify in the config, so I sorted this by instead adding it as the queue param
cluster:
    options: --qos=icb
    queue_manager: "slurm"
    queue: cpu_p

I am just not sure what hierarchical code structure specifies default params. I assumed it should have picked up my config and overwrite everything else but what I see is a mix of my param and some defaults. Just thought of posting in case it helps someone else!

thank you!

Hey Fabiola,

Are you using the latest cgat-core? I made a pull request here because I realised the pipeline wasn't outputting the updated parameters, only the default sge ones (#158). This should solve point 1.

To me the problem seems to be the name of the partition is being called twice as you are adding it as cluster_queue: all.q (The default) and cluster_options: --partition=cpu_p. I think the scheduler is likely getting confused.

Your fix should have solved it:

cluster:
    options: --qos=icb
    queue_manager: "slurm"
    queue: cpu_p

Do you still get problems?

thanks adam for looking into this!
as you pointed out my "corrected" config file fixes the issues, but even after installing the latest cgat-core ( cgatcore==0.6.14?) the cluster params still say it's "sge", although this must be overwritten in action cause the interaction with the scheduler is fine.

cluster = {'queue': 'cpu_p', 'priority': -10, 'num_jobs': 100, 'memory_resource': 'mem_free', 'memory_default': 'unlimited', 'memory_ulimit': False, 'options': '--qos=icb', 'parallel_environment': 'dedicated', 'queue_manager': 'sge', 'tmpdir': False}
cluster_memory_default = 4G
cluster_memory_resource = mem_free
cluster_memory_ulimit = False
cluster_num_jobs = 100
cluster_options = --qos=icb
cluster_parallel_environment = dedicated
cluster_priority = -10
cluster_queue = cpu_p
cluster_queue_manager = slurm

Could you include the whole output of pipeline.log?

python ~/devel/github/pipeline_manuscript/sc_pipelines_muon_dev/panpipes/pipeline_qc_mm.py printconfig > configuration.yml

configuration.txt

Ah this is expected behaviour, maybe I should add a warning that the first output of the pipeline is the default settings (To help debug), when you run the pipeline, you should see that it outputs configuration variables twice, once with the default config and the second with the updated config values. When I have time I will add a note clarifying this.