masato-ka/airc-rl-agent

Question and suggest on reward function for simulator

Closed this issue · 2 comments

The reward function for simulator at the end of an episode is different from JetRacer's, so I think it would be more natural to make them the same like following.

# JetRacer
 return config.reward_reward_crash() - (config.reward_crash_reward_weight() * norm_throttle), done

# Simulator (current)
return config.reward_reward_crash() + config.reward_crash_reward_weight() * (self.speed / 18.0)
# Simulator (propose)
return config.reward_reward_crash() - config.reward_crash_reward_weight() * (self.speed / 18.0)

https://github.com/masato-ka/airc-rl-agent/blob/c938464604065215b29d06d99855b92bc81fec0e/learning_racer/sac/hyperparam.py

If there are some reasons for this difference, I want to know.

Thank you for your suggest. Yes, it is like seem correct. I will fix this bug in next release.

Fix in v1.7.1(1.7.0)