How to Specify Sequence length for Recurrent Network
Closed this issue · 4 comments
Amazing repo! I was wondering if you could help me clarify the confusion I have around the recurrent layer implementations.
I found that the input to A2CBuilder.Network.forward()
seems to only have a sequence of 1, even though in the yaml, it's a non 1 value.
I am currently on commit a33b6c4d easy fix (#145)
, up to date with the most recent master commit
Steps to Reproduce
I ran this command:
python runner.py --train --file rl_games/configs/ppo_lunar_continiuos_torch.yaml
with a breakpoint at rl_games/algos_torch/network_builder.py:341~342
the shape of a_out
, a_states
, c_out
, c_states
are all torch.size([1, 16, 64])
(seq_length, batch_size, input_dim from previous mlp)
Although, in the yaml file. params.config.seq_length: 4
which I assumed to be the length of the rnn sequence.
I also didn't find a mechanism in the code that passes in a sequence of inputs to the RNN.
I'm wondering if I missed something? or if this feature is not yet implemented?
Hi @rhklite Ill write an answer tomorrow. I made some rnn rework not long time ago :)
You might found a new bug. But I have a quick Idea what you can test.
I found that the input to A2CBuilder.Network.forward() seems to only have a sequence of 1, even though in the yaml, it's a non 1 value.
Could you check if it is done during play_steps_rnn function? If yes then it is expected.
During the training you should be able to see exact seq_length == 4.
Thanks for the fast reply!
regarding
I found that the input to A2CBuilder.Network.forward() seems to only have a sequence of 1, even though in the yaml, it's a non 1 value
I verified that this is only during A2CBase.play_steps_rnn()
. When the Network.forward()
is called from A2CAgent.calc_gradients()
, the rnn input is getting exact seq_length ==4.
regarding
Could you check if it is done during play_steps_rnn function? If yes then it is expected.
May I ask how come RNN isn't been fed a sequence during experience collection? I was always under the impression that the network needed a sequence to make a correct prediction.
@rhklite sorry just got time to give an answer:
During inference or play you dont need to pass the whole sequence, I pass (current_obs, hidden_state) to the network and get (actions, next_hidden_state) back. During the train I need to propagate gradients thru the rnn and I need to pass the whole sequence.
thanks! yea, i realized later that you are right, since the rnn sequence in the previous state was already carried over in each step.