Denys88/rl_games

How to Specify Sequence length for Recurrent Network

Closed this issue · 4 comments

Amazing repo! I was wondering if you could help me clarify the confusion I have around the recurrent layer implementations.

I found that the input to A2CBuilder.Network.forward() seems to only have a sequence of 1, even though in the yaml, it's a non 1 value.

I am currently on commit a33b6c4d easy fix (#145), up to date with the most recent master commit

Steps to Reproduce

I ran this command:

python runner.py --train --file rl_games/configs/ppo_lunar_continiuos_torch.yaml

with a breakpoint at rl_games/algos_torch/network_builder.py:341~342

the shape of a_out, a_states, c_out, c_states are all torch.size([1, 16, 64]) (seq_length, batch_size, input_dim from previous mlp)

Although, in the yaml file. params.config.seq_length: 4 which I assumed to be the length of the rnn sequence.

I also didn't find a mechanism in the code that passes in a sequence of inputs to the RNN.

I'm wondering if I missed something? or if this feature is not yet implemented?

Hi @rhklite Ill write an answer tomorrow. I made some rnn rework not long time ago :)
You might found a new bug. But I have a quick Idea what you can test.
I found that the input to A2CBuilder.Network.forward() seems to only have a sequence of 1, even though in the yaml, it's a non 1 value.
Could you check if it is done during play_steps_rnn function? If yes then it is expected.
During the training you should be able to see exact seq_length == 4.

Thanks for the fast reply!

regarding

I found that the input to A2CBuilder.Network.forward() seems to only have a sequence of 1, even though in the yaml, it's a non 1 value

I verified that this is only during A2CBase.play_steps_rnn(). When the Network.forward() is called from A2CAgent.calc_gradients(), the rnn input is getting exact seq_length ==4.

regarding

Could you check if it is done during play_steps_rnn function? If yes then it is expected.

May I ask how come RNN isn't been fed a sequence during experience collection? I was always under the impression that the network needed a sequence to make a correct prediction.

@rhklite sorry just got time to give an answer:
During inference or play you dont need to pass the whole sequence, I pass (current_obs, hidden_state) to the network and get (actions, next_hidden_state) back. During the train I need to propagate gradients thru the rnn and I need to pass the whole sequence.

thanks! yea, i realized later that you are right, since the rnn sequence in the previous state was already carried over in each step.