RuntimeError: rnn: hx is not contiguous when using multilayer-LSTM as network
chaojie-fu opened this issue · 3 comments
chaojie-fu commented
Hi, I came into the error
Traceback (most recent call last):
File "./train.py", line 110, in launch_rlg_hydra
runner.run({
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 139, in run
self.run_train()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/torch_runner.py", line 125, in run_train
agent.train()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1143, in train
step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 1023, in train_epoch
self.train_central_value()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/common/a2c_common.py", line 521, in train_central_value
return self.central_value_net.train_net()
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 176, in train_net
loss += self.train_critic(self.dataset[idx])
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 155, in train_critic
loss = self.calc_gradients(input_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 201, in calc_gradients
values, _ = self.forward(batch_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/central_value.py", line 136, in forward
value, rnn_states = self.model(input_dict)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/rl_games/algos_torch/network_builder.py", line 403, in forward
out, states = self.rnn(out, states)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fuchaojie/DATA_UBUNTU/Isaac_env/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 691, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: rnn: hx is not contiguous
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
when running IsaacGym (Preview 3)'s ShadowHandOpenAI_LSTM
example with the parameter layers
in training config file ShadowHandPPOAsymmLSTM.yaml
set to 2. It seems like that rl_games don't currently support multilayer-LSTM. Is it true or it's just a bug?
Denys88 commented
Seems like a bug.
Ill take a look today, thanks.
Denys88 commented
Fixed.
Just added contiguous here: input_dict['rnn_states'] = [s[:, gstart:gend, :].contiguous() for s in rnn_states]
I've already merged it into the master.
chaojie-fu commented
Now it worked, thanks a lot for your quick response!