NaN episode rewards during baseline training
itstyren opened this issue · 5 comments
Problem:
I am encountering an issue while running the MeltingPot baseline Ray training model. The episode rewards I am getting are consistently NaN (Not-a-Number).
Steps to Reproduce:
python baselines/train/run_ray_train.py --num_gpus 1 --wandb True
The training args are set as"
# training
"seed": args.seed,
"rollout_fragment_length": 5, # Divide episodes into fragments of this many steps each during rollouts.
"train_batch_size": 40, # Batch size (batch * rollout_fragment_length) Trajectories of this size are collected from rollout workers and combined into a larger batch of train_batch_size for learning.
"sgd_minibatch_size": 32, # PPO further divides the train batch into minibatches for multi-epoch SGD
"disable_observation_precprocessing": True,
"use_new_rl_modules": False,
"use_new_learner_api": False,
"framework": args.framework, # torch or tensorflow
# agent model
"fcnet_hidden": (4, 4), # fully connected network
"post_fcnet_hidden": (16,), # Layer sizes after the fully connected torso.
"cnn_activation": "relu",
"fcnet_activation": "relu",
"post_fcnet_activation": "relu",
# == LSTM ==
"use_lstm": True,
"lstm_use_prev_action": True,
"lstm_use_prev_reward": False,
"lstm_cell_size": 2, # A cell, is an LSTM unit
"shared_policy": False,
Please let me know if there's any additional information or logs needed to diagnose this issue. Thank you for your assistance in resolving this problem.
I get the same issue, except, I only get one NaN data point in total instead of the one per step. If I run the evaluation script, however, it reports positive rewards for the agents. So I am also wondering what is going on there
What are the number of workers you are using for this experiment?
I have only seen this problem when there is a mismatch between train_batch_size, num_workers and sgd_minibatch_size. Check if you get the same error if you reduce sgd_minibatch size to 8 or 16.
Thanks for your reply.
What are the number of workers you are using for this experiment?
In this experiment, I have maintained the default number of workers, which is num_workers=2
.
Check if you get the same error if you reduce sgd_minibatch size to 8 or 16.
It seems that reducing the sgd_minibatch_size below 20 is not feasible due to the following error raised by RLlib:
ValueError: `sgd_minibatch_size` (16) cannot be smaller than`max_seq_len` (20).
To investigate whether there is a discrepancy between train_batch_size, num_workers, and sgd_minibatch_size, I increased the training sample as provided default setting:
"rollout_fragment_length": 10,
"train_batch_size": 400,
"sgd_minibatch_size": 32,
executed with the command: python baselines/train/run_ray_train.py --num_gpus 1 --wandb True
, the issue still persists, as shown in this wandb report.
ah, correct about the max_seq_len
. you can change that by adding it to your config too but it should always be greater than equal to sgd_minibatch_size
.
Could you change how long you train? So, change to 10 iterations or so and see if you get Nan for all iterations or only for first few?
Thanks, @rstrivedi, I can confirm that this problem persists even during long-time runs, as demonstrated in this report.
However, I just noticed a discrepancy between num_agent_steps_sampled and num_env_steps_sampled for the default settings, which are as follows:
Metric | Value |
---|---|
num_agent_steps_sampled | 3,200 |
num_agent_steps_trained | 3,200 |
num_env_steps_sampled | 400 |
num_env_steps_trained | 400 |
When I modify train_batch_size to 3200, I am able to obtain accurate episode rewards. I'm uncertain if this is where the issue originates. Do you have any suggestions or insights regarding this matter? Thanks!