why input last action to lstm policy network

Question

junhuang-ifast opened this issue 5 years ago · 1 comments

Hi, may I ask why, for SAC, you input the last action into the policy network for LSTM, where this is not usually done in normal policy network (no LSTM). Is this based on any studies or your own? And does this improve learning compare to if states were the only input? Thanks.
https://github.com/quantumiracle/SOTA-RL-Algorithms/blob/e3fc2de8493ea974081a10112c5005d755172787/common/policy_networks.py#L319

Answer 1 · 2019-12-01T19:50:23.000Z