LucasAlegre/sumo-rl

Getting NaN for env_runners/episode_len_mean

Closed this issue · 3 comments

Hi, Thanks for making this library, I am trying to train the model using PPO algorithm, I am using some new packages, so rectified the errors that were previously occurring. Now the system is running the training, However, I am getting this
image

I read that you have to end the episode to get the metric calculated, I think, it's being ended as well. However, nothing changed. I read you need to update the set, reset function tried that as well but didn't had any progress. Can you help me out?

I haven't made any changes other than to this config variable.

       PPOConfig()
       .environment(env=env_name)
       .rollouts(num_rollout_workers=15, rollout_fragment_length='auto')
       .training(
           train_batch_size=512,
           lr=2e-5,
           gamma=0.95,
           lambda_=0.9,
           use_gae=True,
           clip_param=0.4,
           grad_clip=None,
           entropy_coeff=0.1,
           vf_loss_coeff=0.25,
           sgd_minibatch_size=64,
           num_sgd_iter=10,
       )
       .debugging(log_level="ERROR")
       .framework(framework="torch")
       .resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "1")))
   )```

Things I have tried till now. Keeping ```num_rollout_workers``` as default as well as values between 1-15 as I have 16 cores.
Making changes to env file. However, nothing worked for me till now.

Thanks for looking into the issue.

Hi, the nan is because the episode has not already ended. Typically, each episode in sumo-rl is very long (you can reduce it if you want), and then your library logs the metrics during the episode, but since no episode has already finished, it cannot compute the episode length or return. If you let it run until the episode ends, it will update accordingly.

Hey Lucas, thanks for your prompt answer, Can you give me an ideal length for an episode? The algorithm is terminating without finishing any episode if I set it to 80000 seconds.

There is not an "ideal length". I used to use high values because the real-world is not episodic, and I wanted a continuous simulation. I would suggest you try something like 5000 seconds which would correspond to 5000/5=1000 RL time steps per episode.