COmmand line arguments not overriding defaults
Closed this issue · 23 comments
--cluster-queue-manager slurm not overriding sge when supplied on command line
Yeh I noticed this a while back but wanted to wait for a meeting with others before deciding on the option priority as I think this may need revisiting
I have a different issue here in that the options in my .cgat.yml
file are not overriding the sge default, however if I supply the arguments on the command line, then my pipelines run fine with slurm.
I am also having the issue (not sure whether it is related) that if I run my pipeline from a particular Conda environment, the pipeline statements don't seem to be run within that environment unless I add the following:
P.run(statement, job_condaenv = PARAMS["conda_env"])
cgatcore 0.6.7 py_0 bioconda
drmaa 0.7.9 py_1000 conda-forge
Hmm that's strange that the .cgat.yml isn't overwriting the sge defaults. I am now on a slurm cluster and can debug to find the issue.
This is the opposite of what David has reported, yours seem to be overwriting through command line.
Regarding the conda env issue, that's strange and suggests that your environment may not being copied to the non-interactive shell on the cluster for some reason.
Yes that's the strange thing that it's the opposite of what David is saying - command line seems to overwrite for me.
@Acribbs I am still having this issue on the CCB server and I know several others who are as well, including @deevdevil88. Any thoughts?
Hi Lucy,
Things are pretty busy at the moment with grant writing and paper submission. But I will try and take a look in next few days. In meantime, can you share your .cgat.yml file and a test dataset that highlights the issues?
Thanks
This is the contents of my .cgat.yml
file that is in my home directory - GitHub doesn't support the yml file ending.
cluster:
queue_manager: slurm
queue: batch
I have to run my pipelines as follows:
python pipeline.py --cluster-queue-manager slurm --cluster-queue batch make full
I also have to specify the conda environment for each statement within P.run()
.
It works fine on the BMRC SGE server, so I think it is a Slurm issue.
I will send you an example mini pipeline and input files.
Hi Adam!
similarly to what @lucygarner explains, i found that on my SLURM cluster , the worker node won't inherit the environment activated on the login node hence variables and packages are not being picked up altogether.
Our solution to this is to specify the conda environment in the .cgat.yml file (or separately in each pipeline.yml file, if different environments are needed for each pipeline) and change the P.run(...) commands in the pipeline.py to activate the conda env param, i.e. P.run(statement, job_condaenv = PARAMS["conda_env"])
I was hoping for a different workaround though, one that doesn't rely on having to rewrite the P.run() statements but defines the same conda env for all jobs.
Any idea?
thanks!!
fabiola
@bio-la, are your pipeline scripts picking things up from the .cgat.yml file? I seem to be having the additional issue that the info in that file is not picked up.
yes, my pipeline script picks PARAMS from .cgat.yml as well as pipeline.yml
I wasn't aware of this until charlotte pointed that out to me and it works, just tested that.
Hi,
python pipeline.py printconfig
may help troubleshoot the issue. Specifically there should be a section: List of .yml files used to configure the pipeline
in the output with the list of yaml files being picked up and their associated priority to load them up (higher priority overrides lower ones)
I hope that helps.
Best regards,
Sebastian
Thanks @sebastian-luna-valero. Very helpful! Interestingly it says it is giving my .cgat.yml
file the highest priority.
List of .yml files used to configure the pipeline
Priority : File
2 : mapping_pipeline.yml
(highest) 1: /home/medawar/lgarner/.cgat.yml
This is the contents of my .cgat.yml
in my home directory:
cluster:
queue_manager: slurm
queue: batch
But somehow when I run the pipeline, it tries to use sge
unless I specify --cluster-queue-manager slurm
.
This is what it looks like if I run the pipeline without --cluster-queue-manager slurm
.
# 2022-02-10 18:11:19,990 INFO always_mount : False \
# cluster_memory_default : unlimited \
# cluster_memory_resource : None \
# cluster_num_jobs : None \
# cluster_options : None \
# cluster_parallel_environment : None \
# cluster_priority : None \
# cluster_queue : None \
# cluster_queue_manager : sge \
# config_file : pipeline.yml \
Hi,
Please paste the output of:
python pipeline.py printconfig | grep "cluster_queue"
python pipeline.py printconfig --cluster-queue-manager slurm | grep "cluster_queue"
I would like to compare the values of cluster_queue_manager
in the lines of the output starting with and without #
.
Best regards,
Sebastian
Hi Sebastian,
I met with Lucy and I sorted out her issues (bashrc issues), but in the process realised that while the pipeline submits jobs to the slum correctly, it prints out values at the beginning of the pipeline as the default sge. I can take a look at this, but I suspect it may be displaying the params before they are updated.
Hi @Acribbs ,
Sorry to open this again.
im also having the same problem as Lucy, so would be good to know what the fix is and what i need to add or remove from my bashrc to fix this.
Best
Devika
Can you share your bashrc with me and I can see if it has the issue?
@Acribbs . sure thing . here you go
Hmm, there doesn't seem to be a major issue with your bashrc, however can you try commenting out the SHARED_TMPDIR and the DRMAA_LIBRARY and try running the pipeline. Was was you specific issue? You cannot submit jobs to the cluster?
The issue that fixed @lucygarner was related her environment not inheriting variables in non interactive shells, fixed by:
if [[$PS1]]; then
#bash code
fi
@Acribbs same issue as @lucygarner , that when i run cgatcore pipelines on the CBRG, it doesnt inherit the params from .cgat.yml file in my home directory. i m able to run the pipeline if specify the --cluster-queue-manager and --cluster-queue params in the pipeline make command when run the pipeline but not otherwise
do you get drama API errors? Can you show me a failed log trace please and one that works?
you know what i actually dint check running my pipelines recently without setting the --cluster-queue and --cluster-queue-manager since i last encountered the error back when we discussed this on the issue and seems like it works now. i magine it must be to do with my .bashrc originally and i guess back in May last year when we all used the same .bashrc for CCB based Charlies example, that must have sorted it out. But i dint realise. Sorry seems like it works now
Ah brilliant!