nico-bohlinger/one_policy_to_run_them_all

Issues with training with slurm

Closed this issue · 2 comments

Hi! When running the command sbatch experiment.sh, I am facing the following errors:

sbatch: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
sbatch: error: fetch_config: DNS SRV lookup failed
sbatch: error: _establish_config_source: failed to fetch config
sbatch: fatal: Could not establish a configuration source

These errors are showing up even though i did not add any new robots or make any changes to the repo. Did anyone face a similar issue? Thank you!

These errors look like general slurm errors specific to you slurm setup on your cluster. Do other slurm files run for you on your cluster? Maybe one of the SLURM arguments (top of the experiment.sh) that I use don't work on your slurm setup

Hi apologies for the delayed response. I faced some errors with the slurm setup so I ended up using bash to run it instead.