seungeunrho/minimalRL

Wrong td_target and test() call in a3c implementation

rahulptel opened this issue · 1 comments

First of all, this is a really nice repo - simple and clean.

I have two issues with the a3c implementation

  1. The td_target calculated in https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L70 gives the same weight of gamma to calculate the value of the s_prime (the last state visited). Let's say s is your starting state and you are doing n step return, then, the target will be \sum_{i=0}^{n-1} gamma^{i} r_i + gamma^{n} v(s_prime)

You can make the following change https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L59 to the following,
R = 0.0 if done else model.v(s_prime).detach().item()
then, td_target = R_batch

  1. test might be executed before the training is complete. If you plan to probe how good the model is during training than this is alright. But if you wish to see the model performance once the training is complete then you should fire the test after .join() on train processes.

Oh my.. you're correct. I haven't noticed the bug because the old td_target actually worked.
Maybe it is due to small update_interval which is 5.
If it was bigger, then the old td_target might have ruined the performance.
I've modified the code for calculating td_target as you suggested.

Regarding the second issue, I wanted to check the performance of the model during training.
I appreciate your thoughtful comment, thank you.