Wrong td_target and test() call in a3c implementation
rahulptel opened this issue · 1 comments
First of all, this is a really nice repo - simple and clean.
I have two issues with the a3c implementation
- The
td_target
calculated in https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L70 gives the same weight ofgamma
to calculate the value of thes_prime
(the last state visited). Let's says
is your starting state and you are doingn
step return, then, the target will be \sum_{i=0}^{n-1} gamma^{i} r_i + gamma^{n} v(s_prime)
You can make the following change https://github.com/seungeunrho/minimalRL/blob/master/a3c.py#L59 to the following,
R = 0.0 if done else model.v(s_prime).detach().item()
then, td_target = R_batch
test
might be executed before the training is complete. If you plan to probe how good the model is during training than this is alright. But if you wish to see the model performance once the training is complete then you should fire thetest
after.join()
ontrain
processes.
Oh my.. you're correct. I haven't noticed the bug because the old td_target
actually worked.
Maybe it is due to small update_interval
which is 5.
If it was bigger, then the old td_target
might have ruined the performance.
I've modified the code for calculating td_target
as you suggested.
Regarding the second issue, I wanted to check the performance of the model during training.
I appreciate your thoughtful comment, thank you.