quantumiracle/Popular-RL-Algorithms

why input last action to lstm policy network

junhuang-ifast opened this issue · 1 comments

Hi, may I ask why, for SAC, you input the last action into the policy network for LSTM, where this is not usually done in normal policy network (no LSTM). Is this based on any studies or your own? And does this improve learning compare to if states were the only input? Thanks.
https://github.com/quantumiracle/SOTA-RL-Algorithms/blob/e3fc2de8493ea974081a10112c5005d755172787/common/policy_networks.py#L319

Hi,
This is common for LSTM policy to learn the dynamics of env, referred to:
http://rll.berkeley.edu/deeprlworkshop/papers/rdpg.pdf
https://arxiv.org/pdf/1710.06537.pdf