CGATOxford/CGATCore

Running pipelines locally with Ubuntu

Closed this issue · 3 comments

From my understanding of Execution.py, statements run locally (e.g --local option with pipeline command), are executed using subprocess.Popen() with shell=True. Thus, the statement is prefixed with /bin/sh to run with the shell. On most systems this is bash but on e.g Ubuntu, this is dash. I believe this causes an error with our statements since running using Ubuntu gives the following error:

bin/sh: 11: Bad substitution\n' which seeems to be related to using dash to run a bash-compatible script.

I'm reporting this issue second hand however, so I haven't had the chance to absolutely confirm this is the issue.

If I'm right, should we allow users to define the shell, perhaps via an optional UNIX variable?

P.s. The docs for subprocess.Popen() also says:

Warning
Using shell=True can be a security hazard. See the warning under Frequently Used Arguments for details.

Note
Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes. 

We're doing both. Is this an issue?

#4 fixes it for me.
I am using Ubuntu + zsh, can't say about other environments though.

Many thanks both for your feedback!

Please note that this is an old repo, and CGATCore is under development in a new repo:
https://github.com/cgat-developers/cgat-core

Could you please install and use the new version?

Thanks all.
I think the security issue would be relevant if you set up a pipeline, but let in run by somebody else. I assume they could then manage to run things with your privileges, instead of their own.

Thanks for flagging stdout/stderr. The stdout/stderr should not be a major issue as ideally a task saves its stdout to a file (P.run() returns metrics, not the contents of stdout). It has not been an issue in the past, but using communicate() might be safer.