why input last action to lstm policy network
junhuang-ifast opened this issue · 1 comments
junhuang-ifast commented
Hi, may I ask why, for SAC, you input the last action into the policy network for LSTM, where this is not usually done in normal policy network (no LSTM). Is this based on any studies or your own? And does this improve learning compare to if states were the only input? Thanks.
https://github.com/quantumiracle/SOTA-RL-Algorithms/blob/e3fc2de8493ea974081a10112c5005d755172787/common/policy_networks.py#L319
quantumiracle commented
Hi,
This is common for LSTM policy to learn the dynamics of env, referred to:
http://rll.berkeley.edu/deeprlworkshop/papers/rdpg.pdf
https://arxiv.org/pdf/1710.06537.pdf