Nan error in Humanoid
varun-intel opened this issue · 2 comments
I'm seeing nan values in the actions when I try to train an SAC policy on Humanoid.
This is the command I'm running:
python3 softlearning/scripts/console_scripts.py run_example_local examples.development --universe "gym" --domain "Humanoid" --task "v2" --gpu 1 --trial-gpus 1
The error appears very early, even before the first epoch is completed. It seems to happen in the model created by shift_and_log_scale_diag_net. Conditions doesn't have any nan values but shift_and_log_scale_diag does.
I am using tensorflow-gpu version 1.15.
What's your tensorflow-probability
version? With older versions, there's an issue with the numerical instability of the squash head of the policy (i.e. the tfp.bijectors.Tanh
bijector): tensorflow/probability#318.
I was incorrectly using mujoco 1.5 instead of mujoco 2; the error goes away with the newer package.
The issue is fixed, but I'd be curious to know if you've seen something similar with older mujoco versions.