Wrong td_target and test() call in a3c implementation

Question

Wrong td_target and test() call in a3c implementation

rahulptel opened this issue 6 years ago · 1 comments

First of all, this is a really nice repo - simple and clean.

I have two issues with the a3c implementation

The td_target calculated in https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L70 gives the same weight of gamma to calculate the value of the s_prime (the last state visited). Let's say s is your starting state and you are doing n step return, then, the target will be \sum_{i=0}^{n-1} gamma^{i} r_i + gamma^{n} v(s_prime)

You can make the following change https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L59 to the following,
R = 0.0 if done else model.v(s_prime).detach().item()
then, td_target = R_batch

test might be executed before the training is complete. If you plan to probe how good the model is during training than this is alright. But if you wish to see the model performance once the training is complete then you should fire the test after .join() on train processes.

Answer 1 · 2019-07-11T23:30:41.000Z

Oh my.. you're correct. I haven't noticed the bug because the old td_target actually worked.
Maybe it is due to small update_interval which is 5.
If it was bigger, then the old td_target might have ruined the performance.
I've modified the code for calculating td_target as you suggested.

Regarding the second issue, I wanted to check the performance of the model during training.
I appreciate your thoughtful comment, thank you.