allegroai/clearml-agent

environment variables in default_docker arguments of clearml.conf not passed to container on first run

jokokojote opened this issue · 2 comments

Description

Environment variables included in the default docker arguments of the clearml.conf file are not passed to the running container (despite they were auto. injected correctly in the UI). The same arguments work when rerunning the experiment or when entered in ARGUMENTS manually before scheduling experiment run.

To Reproduce

Steps to reproduce the behavior:

  1. Set some default docker arguments in clearml.conf including environment variables like e.g.
default_docker: {
    image: "pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel"

    # optional arguments to pass to docker image
    arguments: ["--network=host", "-e http_proxy=http://10.56.130.176:3128", "-e https_proxy=http://10.56.130.176:3128", "-e no_proxy=localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"]
}
  1. Start clearml server / agents like e.g.:
export CLEARML_HOST_IP=localhost
docker-compose -f /opt/clearml/docker-compose.yml up -d
clearml-agent daemon --docker --gpus 0 --queue 6d26cd747ec64700b963c2d04d26c14b
  1. Clone an experiments which was created previously e.g.manually within a notebook (that ist, CONTAINER IMAGE and ARGUMENTS settings are empty for this experiment in UI).

  2. Enqueue and run the experiment

  3. CONTAINER IMAGE and ARGUMENTS settings are filled correctly with the default values from clearml.conf immediately, but the environment variables are not applied to the container. Network mode is "host" (which is correct), but proxy environment variables are not set.

  4. Abort the experiment

  5. Reschedule the experiment (CONTAINER IMAGE and ARGUMENTS settings are kept as is with the previously auto. injected values)

  6. Everything works like expected including container environment variables.

Expected behavior

Default container environment variables from clearml.conf should be passed to the container at first experiment run correctly like other docker arguments.

Environment

  • OS: Ubuntu 23.04
  • Browser: Chrome 119.0.6045.105
  • clearml-agent 1.6.1
  • Docker 24.0.7

PS: I tried
arguments: ["--network=host -e http_proxy=http://10.56.130.176:3128 -e https_proxy=http://10.56.130.176:3128 -e no_proxy=localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 ", ]
(one string) and
arguments: ["--network=host", "-e http_proxy=\"http://10.56.130.176:3128\"", "-e https_proxy=\"http://10.56.130.176:3128\"", "-e no_proxy=\"localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16\""]
(escaped quotes) in config file as well and did not change anything.

Hi @jokokojote , I think you simply specified the variables incorrectly, should be:

arguments: ["--network=host", "-e", "http_proxy=http://10.56.130.176:3128", "-e", "https_proxy=http://10.56.130.176:3128", "-e", "no_proxy=localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"]

@jkhenning thank you, works.