The sparse reward settings on mujoco
GaoHaoCN opened this issue · 0 comments
The reward setting in Mujoco is confusing. When the agent steps a fixed distance from the starting point (i.e., 0) (2 or 20, temporarily denoted by symbol d), the agent receives a reward of 1 at each state and step. So, with this reward setup proposed by the authors, it doesn't feel like a sparse reward problem. In addition, in this reward setting proposed by the author, it feels like the agent is encouraged to go out of the circle of radius d (when stepping out of the circle of radius d, the agent can get a reward for every step even if it stands still), whereas the original dense reward setting encourages the agent to go further. So, this modification changes the original mission's intent. Finally, I tried to modify the reward, giving the agent a reward of 1 for every d distance traveled, and I found that this approach did not work.
If other readers have also read this question, please help me to answer my doubts, thank you very much!!!