Reward shaping not removed in evaluation in CarRacing-From-Pixels-PPO
Closed this issue · 5 comments
Hi,
The figure and log in README shows scores >1000, which due to the CarRacing's design, is not quite possible.
It turns out that the reward shaping in Wrapper.step()
is not removed in evaluation and that leads to incorrect results.
Commenting out relevant lines, I got an average score of 820 over 100 episodes.
Thanks for your note,
Could you info that the relevant lines should be commented out or your version of Wrapper.step()
Hi,
The following snippet is what I used for evaluation, let me know if this makes sense to you :)
def step(self, action):
total_reward = 0
for i in range(action_repeat):
img_rgb, reward, die, _ = env.step(action)
# don't penalize "die state"
# if die:
# reward += 100
# green penalty
# if np.mean(img_rgb[:, :, 1]) > 185.0:
# reward -= 0.05
total_reward += reward
# if no reward recently, end the episode
# done = True if self.av_r(reward) <= -0.1 else False
done = False
if done or die:
break
img_gray = rgb2gray(img_rgb)
self.stack.pop(0)
self.stack.append(img_gray)
assert len(self.stack) == img_stack
return np.array(self.stack), total_reward, done, die
Hi,
I cannot agree with your version. For example, where your "green penalty"? You need penalize the car driving to the green field. Possibly, the green threshold should be lower than 185, or reward should be more accurate than -0.05.
Let's see if the followings can describe my points better.
- There are 2 versions of
Wrapper.step()
, 1 for training and 1 for evaluation. - You can add whatever reward shaping in the training version. E.g., penalty for driving to grass.
- You should not add reward shaping in the evaluation version. E.g., CarRacing is considered solved when avg reward > 900, but it is not very fair if you add 100 upon
die==True
or end the episode earlier if you notice the car is not running well, right?
The code snippet I used was for evaluation.
@lerrytang
Let us look at the OpenAI CarRacing environment code
https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py
lines 337-339
If the car went outward the field, the reward is penalized by -100.
However, if the track is over (absolutely successful case) the reward ALSO penalized by -100
Then, for fairness, we restore the reward by +100 in Wrapper.step()