Issues with training with slurm
Closed this issue · 2 comments
Hi! When running the command sbatch experiment.sh, I am facing the following errors:
sbatch: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
sbatch: error: fetch_config: DNS SRV lookup failed
sbatch: error: _establish_config_source: failed to fetch config
sbatch: fatal: Could not establish a configuration source
These errors are showing up even though i did not add any new robots or make any changes to the repo. Did anyone face a similar issue? Thank you!
These errors look like general slurm errors specific to you slurm setup on your cluster. Do other slurm files run for you on your cluster? Maybe one of the SLURM arguments (top of the experiment.sh) that I use don't work on your slurm setup
Hi apologies for the delayed response. I faced some errors with the slurm setup so I ended up using bash to run it instead.