LUMC/pytest-workflow

Use of environment variables should be clarified in the documentation

awgymer opened this issue · 9 comments

I am trying to run a test where I have an environment variable in the command but it seems that it is not expanded or something.

The command (which runs as expected outside pytest-workflow) is:

nextflow run ./tests/test -entry test -profile $NF_PROFILE

And the error is a nextflow one: Unknown configuration profile: '$NF_PROFILE'
It appears to be passing $NF_PROFILE literally? Is this the expected behaviour?

It appears to be passing $NF_PROFILE literally? Is this the expected behaviour?

Yes, this uses Python's subprocess which executes the command without a shell. So shell functionality such as variables is not necessarily present.

Can you try bash -c "nextflow run ./tests/test -entry test -profile $NF_PROFILE" ? This will execute the statement in bash.

Ah that makes sense. bash -c works. Out of interest, is it apytest-workflowa doing the subprocess call or pytest itself? I am wondering if there is maybe an option to set shell=True somehow instead to keep the command cleaner (or a future PR to make it possible)

The call is made here in pytest-workflow:

def start(self):
"""Runs the workflow in a subprocess in the background.
To make sure the workflow is finished use the `.wait()` method"""
# The lock ensures that the workflow is started only once, even if it
# is started from multiple threads.
with self.start_lock:
if not self._started:
try:
stdout_h = self.stdout_file.open('wb')
stderr_h = self.stderr_file.open('wb')
sub_process_args = shlex.split(self.command)
self._popen = subprocess.Popen( # nosec: Shell is not enabled. # noqa
sub_process_args, stdout=stdout_h,
stderr=stderr_h, cwd=str(self.cwd))
except Exception as error:
# Append the error so it can be raised in the main thread.
self.errors.append(error)
finally:
self._started = True
stdout_h.close()
stderr_h.close()
else:
raise ValueError("Workflows can only be started once")

I am wondering if there is maybe an option to set shell=True somehow instead to keep the command cleaner (or a future PR to make it possible)

I agree that would make the command cleaner. However this might behave differently depending on the system's shell. To keep behavior consistent I think it is best to not enable shell and let the user explicitly declare which shell is used. This way the behavior is always consistent between runs on different system. Which for a workflow tester I think is very important.

Also it disables some 'dangerous' shell behaviour by default. In the above case someone could set $NF_PROFILE to MYPROFILE && echo "$PG_PASSWORD" && rm -rf --no-preserve-root / for example. That was another reason why explicit shell enabling was chosen.

Maybe this should be highlighted a bit more in the documentation.

Is there really any difference in security between bash -c and using shell=True?
What I was thinking of was more of an option - something like:

- name: test
  command: mytest command --option 1
  shell: true

Which could then be passed via the workflow object to subprocess.

If that's not of interest then more explicit documentation of the issue/solution (e.g. using bash -c ) would be great!

This is an example of it being explicit in the command - but if I have it in my nextflow.config for example:

params.data_dir = $DATA_DIR

then it is less obvious you might need to use bash -c

Is there really any difference in security between bash -c and using shell=True?

Yes. With bash -c you decide that injection attacks are no problem, with shell=True the developer will have decided for you. As a developer, I cannot make that decision for everyone who uses this project.

What I was thinking of was more of an option - something like:

This is a good suggestion. However shell is not as clear as bash -c. shell=True refers to the default shell. This might be different between systems. To prevent workflow tests to crash on different systems it is always better to explicitly declare which shell needs to be used. For example /bin/sh on Debian=based systems will use dash, which might lead to unexpected crashes when bash is expected. Some users might have set fish as their default shell, etc.

So in conclusion, pytest-workflow leaves the decision of running in a shell to the user, and the user must be explicit about the shell used. I think this approach has the least disadvantages.

If that's not of interest then more explicit documentation of the issue/solution (e.g. using bash -c ) would be great!

Yes, that is indeed a good thing to do. I will leave this issue open until I have updated the documentation.

@awgymer can you check the explanation in #138 ? EDIT: Specifically the documentation change in: https://github.com/LUMC/pytest-workflow/pull/138/files

Would that be sufficient?

@rhpvorderman Yes I think that nicely covers the issues I was having and sufficiently explains the solution!

Thanks for getting back to us. Allright it will be in the documentation for the next release.