vatlab/sos

Deprecating -b BIN_DIR

BoPeng opened this issue · 6 comments

Right now we have an almost hidden feature that

  1. ~/.sos/bin would be before anything
  2. -b PATH would be preprended to $PATH.

This was designed to specify which command to use when there are multiple versions, but it would be a lot less flexible and useful to do

sos run workflow -b /path/to/R3.3

than

sos run -r host_p1

when host_r3.3 is defined with

module load R/3.3
sos run ....

since module load can do a lot more than setting $PATH.

gaow commented

Wel ... i use -b for many of my pipelines on my desktop where I keep the particular executable in a particularly analysis separately, instead of using more formal approaches such as conda env (or your example, module load). I think it still have its appeals. However I do also agree that users can always do export PATH=BIN_DIR:$PATH. So i dont have a very strong opinion on this matter. I dont use ~/.sos/bin though.

The extended -r host feature that is being implemented in the 1319 branch allows you to predefine a number of "running environments" for your pipeline, which includes but not limited to conda activate, export PATH, module load, and allows for the more ambitious plan for vatlab/sos-notebook#262 where a sandbox could be created before sos run, and do something following the execution of sos run. It is a much more flexible option than -b so I tend to deprecate -b.

gaow commented

This is nice -- sure I've got no issue to get rid of -b .

While we are on it, shall we also allow for step specific running environments, or we have this mechanism already? For example using some templates we configure a step to run something like conda activate, export PATH and module load ... This is also something I experience in benchmark applications where for example I have different git commits of the same software to compare performance against each other, from different conda environments. I would like to do them dynamically from configuration files so I dont have to touch my workflow script. Dockerizing them would not work for cluster ...

There are several flavors of this

  1. A multi-kernel online notebook (myjournal.com?) allows beyond the specification of "kernels" for each cell, to "environments", which are seemingly docker images. it is a nice extension to SoS Notebook's multi-kernel approach (which I do not know how to achieve under the current sos notebook framework).

  2. However, even if we cannot do 1 under sos notebook, we could possibly do it under SoS, with something like

[step: env=refer_to_host_template]
  1. A "wrapper" script for actions, conceptually like
[step]
sh:
    module load ... R3.3
   R:
      script

that let R executed inside the environment created by sh is not possible, but we could allow

R: env=conf_template

where conf_template has the same syntax of job_template etc.

  1. For a similar issue we have options such as
task: prepend_path

which I really do not like but sometimes need it. It could be unified to task: env=conf.

The bottom line is that what you are proposing is something I had in mind for a while but have not figured out the best approach yet, and it all these could be unified under a template approach that is being implemented for -r host.

gaow commented

R: env=conf_template
task: env=conf

I think this is good enough. And a different template should trigger reruns. But for really short environment specifically can we simply eg env="conda activate ..."?

Also I think env is already a task/action option. We need something else.

Also I think env is already a task/action option. We need something else.

env will be deprecated if we have the new feature. We do not need multiple features for (almost) the same purpose, even if one is a simplified version of another.

conf is possible but we are using a template, not a configuration.
template is a bit long but acceptable, the problem is that it does not say what a template does here.

So overall env seems like a good name for such an option.