RuntimeError: rnn: hx is not contiguous when using multilayer-LSTM as network

Question

RuntimeError: rnn: hx is not contiguous when using multilayer-LSTM as network

chaojie-fu opened this issue 3 years ago · 3 comments

Hi, I came into the error

Traceback (most recent call last):
  File "./train.py", line 110, in launch_rlg_hydra
    runner.run({
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 139, in run
    self.run_train()
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 125, in run_train
    agent.train()
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1143, in train
    step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1023, in train_epoch
    self.train_central_value()
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 521, in train_central_value
    return self.central_value_net.train_net()
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 176, in train_net
    loss += self.train_critic(self.dataset[idx])
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 155, in train_critic
    loss = self.calc_gradients(input_dict)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 201, in calc_gradients
    values, _ = self.forward(batch_dict)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 136, in forward
    value, rnn_states = self.model(input_dict)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/network_builder.py", line 403, in forward
    out, states = self.rnn(out, states)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 691, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: rnn: hx is not contiguous

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

when running IsaacGym (Preview 3)'s ShadowHandOpenAI_LSTM example with the parameter layers in training config file ShadowHandPPOAsymmLSTM.yaml set to 2. It seems like that rl_games don't currently support multilayer-LSTM. Is it true or it's just a bug?

Answer 1 · 2021-12-05T18:34:57.000Z

Seems like a bug.
Ill take a look today, thanks.

Answer 2 · 2021-12-05T22:18:40.000Z

Fixed.
Just added contiguous here: input_dict['rnn_states'] = [s[:, gstart:gend, :].contiguous() for s in rnn_states]
I've already merged it into the master.

Answer 3 · 2021-12-06T07:14:57.000Z

Now it worked, thanks a lot for your quick response!