Allow for relative paths in the code (or adapt prepare_code)
giovannipizzi opened this issue · 4 comments
Currently, prepare_code will reuse the code if a code with the same label and computer is found.
However, I have the following usecase.
- I run
prepare_code("python")
as I want to use it, e.g. create a virtual env. It will create a code and put the absolute path (I think this is the current behaviour?) - I create a virtualenv
- I want now to run from inside the virtualenv. I'd like to just submit a ShellJob specifying
and python as an executable, so it finds the one in the path; but since there is already a code with that name, it will reuse it, but it will then put the "wrong" absolute path, obtained in step 1.
metadata = {'options': { 'prepend_text': f'source {VENV_NAME}/bin/activate', }
Of course I could create a new python code with the correct abs path, but this can become quickly annoying if I'm creating many venvs on the fly. Is there a way to specify that we want a relative path? (maybe it's currently a limitation in AiiDA?) Or to configure a new code with the same name?
Thanks for the report @giovannipizzi . The use-case of the relative path I just added last week in #73 , I just haven't released it yet since I merged it yesterday and am waiting for #70 as well. Should be releasing that soon.
That leaves the question how we can easily have multiple codes that reference the python
executable but load a different virtual env. Currently, uniqueness is determined by the full label executable@computer-label
. The prepare_code
is just a utility function used by the launch_shell_job
wrapper to automate creating a Code
node. But this is all just a wrapper around a the ShellJob
. You can still manually create a Code
node and pass that to launch_shell_job
directly.
I think for this use case, the user should simply create the desired code manually. We could add an option to prepare_code
to customize the prepend_text
, but then why just that and not any of the other computer/code properties? And how would the label then be determined? Just the command name as label will no longer suffice to make it unique. A random suffix? But that will make it difficult to recognize. If you have clever ideas for the UX I'd gladly consider them.
Thanks! I think that #73 solves my use case! I'll continue testing and let you know if it indeed works.
I'd like to reopen this. Still in the current code I realize that even if you pass resolve_command
, if a Code with the same name already was created, that will just be used.
I tried to adapt the code as follows:
def prepare_code(command: str, computer: Computer | None = None, resolve_command: bool = True) -> AbstractCode:
"""Prepare a code for the given command and computer.
This will automatically prepare the computer.
:param command: The command that the code should represent. Can be the relative executable name or absolute path.
:param computer: The computer on which the command should be run. If not defined the localhost will be used.
:param resolve_command: Whether to resolve the command to the absolute path of the executable. If set to ``True``,
the ``which`` command is executed on the target computer to attempt and determine the absolute path. Otherwise,
the command is set as the ``filepath_executable`` attribute of the created ``AbstractCode`` instance.
:return: A :class:`aiida.orm.nodes.code.abstract.AbstractCode` instance.
:raises ValueError: If ``resolve_command=True`` and the code fails to determine the absolute path of the command.
"""
computer = prepare_computer(computer)
resolve_string = "" if resolve_command else "-relativepath"
code_label = f'{command}{resolve_string}'
full_code_label = f'{code_label}@{computer.label}'
try:
code: AbstractCode = load_code(full_code_label)
resolve_idx = 0
while code.get_executable().is_absolute() != resolve_command:
resolve_idx += 1
code_label = f'{command}{resolve_string}-{resolve_idx}'
old_full_code_label = full_code_label
full_code_label = f'{code_label}@{computer.label}'
LOGGER.info(
'Code `%s` has %s path instead of %s, trying a new name %s.',
full_code_label,
'absolute' if code.get_executable().is_absolute() else 'relative',
'absolute' if resolve_command else 'relative',
old_full_code_label
)
code: AbstractCode = load_code(full_code_label)
except exceptions.NotExistent as exception:
LOGGER.info('No code exists yet for `%s`, creating it now.', code_label)
if resolve_command:
with computer.get_transport() as transport:
status, stdout, stderr = transport.exec_command_wait(f'which {command}')
executable = stdout.strip()
if status != 0:
raise ValueError(
f'failed to determine the absolute path of the command on the computer: {stderr}'
) from exception
else:
executable = command
code = ShellCode( # type: ignore[assignment]
label=code_label, computer=computer, filepath_executable=executable, default_calc_job_plugin='core.shell'
).store()
return code
I'm not making a PR as there are some outstanding questions.
- is it OK to have -relativepath` appended?
- Is it OK to append -1, -2, ... to make codes unique?
- now I changed the logic - the name essentially will never be the abspath if you do not specify the abspath yourself. But when doing this, I realized that with the current implementation in the main branch, you could have the following usecase:
- you first call
prepare_code('/bin/bash')
;/bin/bash@localhost
is created - you then call
prepare_code('bash')
. If the bash code does not exist, it will try to create it, but since by defaultresolve_command
is True, it will eventually create another/bin/bash@localhost
code, i.e. two codes with the same label
I don't think (as I think you say) that there is a robust way to do it without opening a connection via the transport every timeprepare_code
is called. Therefore, I think it's best not to use/bin/bash
as a path, if the user just wrotebash
; better to call itbash-1
. And we call it/bin/bash
only if the user passed that as the parameter. This is essentially how the code above works.
- you first call
If you think this is OK, feel free to adapt and reuse the code above.
Also, in this case, I would add one more test if the code is found, to ensure that the code.get_executable()
is indeed what you would have set. If resolve_command
is False, this is easy.
If it's False, I guess this is complex, if the user passed just the binary name and you want really to check if it's the same executable. Maybe this is something to tell users - if you care about having one executable, pass its full path. If you pass just the binary name, it will either try to get the full path, or reuse a code that "makes sense" and is already available on your AiiDA installation. One could still add an optional flag "force_check" but maybe it starts becoming too complex?
One could still add an optional flag "force_check" but maybe it starts becoming too complex?
This was exactly my point in my previous comment. Here you are just trying to deal with the different scenarios of absolute vs relative paths and already it is getting (too) complex. And this doesn't even start to take care of codes that require specific prepend texts or append texts, such as loading virtual environments.
The design of aiida-shell
intentionally reuses the CalcJob
interface and provides the ShellJob
implementation to give full access to the normal interface. The launch_shell_job
is just a simple wrapper that makes getting started even easier by not having to create the code. The main focus here is for new users. As soon as there are more experienced users with more specific requirements, they can change to directly use the ShellJob
interface. Or even still use launch_shell_job
but create the code themselves, because launch_shell_job
accepts a Code
instance for the command
.
I am currently not convinced that adding all this complexity is the right way to go. Instead, maybe we can just improve the documentation a bit on using custom codes, e.g. here: https://aiida-shell.readthedocs.io/en/latest/howto.html#defining-a-pre-configured-code I could add a snippet of how to actually setup such as code, or point to the aiida-core
docs.