Why did you remove the death penalty for solving CarRacing with PPO from raw pixels?

Question

Why did you remove the death penalty for solving CarRacing with PPO from raw pixels?

llucid-97 opened this issue 4 years ago · 6 comments

Hi, I've been looking through your code as a reference to figure out how to solve CarRacing-v0.

Mine works up to a point then has a catastrophic performance crash.
The only difference I can find between my version and yours is that When the unwrapped environment is done (fails) the agent gets a big negative reward.
You removed this in your wrapper, and I don't understand why.

What's the significance of offsetting the reward there?

Answer 1 · 2020-01-01T21:13:38.000Z

@ihexx Please show part of the relevant code in this thread and indicate the corresponding line/lines. This could help solve the issue.

Answer 2 · 2020-01-02T11:05:24.000Z

Thanks for the quick response:

it's this line from your ipython notebook on CarRacing PPO under block 2. Class Wrapper

# don't penalize "die state"
        if die:
            reward += 100

I don't understand why it's there but if I remove it, I sometimes get these performance crashes
I'm just trying to understand why you thought to use it

Answer 3 · 2020-01-03T22:35:25.000Z

@ihexx Possibly, the problem of this part is that variable "die" actually should be named "done".
It turned out so historically. Let us see the code of of the function step() in \gym\envs\box2d\car_racing.py The variable "done" gets True in 2 cases:

The current track is over, this case is successfully completed,
don't penalize it. In this case, the reward gets +100.
The car went outward the field.
Here the env state is already penalized by -100,
see the function step(). In this case the reward gets -100 +100 = 0.

Answer 4 · 2020-01-06T08:08:42.000Z

ah ok, I understand that now, but I still don't understand why you're changing the reward or why it works so well. I mean, what's wrong with there being the -100 reward if it fails the environment?

Answer 5 · 2020-01-08T19:30:59.000Z

@ixexx I think this is a question of a long and difficult tuning, see https://github.com/xtma/pytorch_car_caring/blob/master/train.py

Answer 6 · 2020-01-13T14:11:58.000Z

ah ok, that makes sense. I'd just never thought of that as a parameter to tune before. Thanks