rail-berkeley/softlearning

Nan error in Humanoid

varun-intel opened this issue · 2 comments

I'm seeing nan values in the actions when I try to train an SAC policy on Humanoid.
This is the command I'm running:
python3 softlearning/scripts/console_scripts.py run_example_local examples.development --universe "gym" --domain "Humanoid" --task "v2" --gpu 1 --trial-gpus 1

The error appears very early, even before the first epoch is completed. It seems to happen in the model created by shift_and_log_scale_diag_net. Conditions doesn't have any nan values but shift_and_log_scale_diag does.

I am using tensorflow-gpu version 1.15.

What's your tensorflow-probability version? With older versions, there's an issue with the numerical instability of the squash head of the policy (i.e. the tfp.bijectors.Tanh bijector): tensorflow/probability#318.

I was incorrectly using mujoco 1.5 instead of mujoco 2; the error goes away with the newer package.
The issue is fixed, but I'd be curious to know if you've seen something similar with older mujoco versions.