Problem I meet in how TD method and MC method update the last state-value in a MRP
xingE650 opened this issue · 1 comments
This is my first time use github issue, please forgive me if I offend you. In Chapter 06, Example 6.4, when I try to use the batch updating code(batch_updating(method, episodes, alpha=0.001) at 132 row) in random_walk.py to solve it, I find the value of the last state in an episode can not be updated, because of the following code:
for i in range(0, len(trajectory_) - 1):
at 153 row in the chapter06/random_walk.py file.
I think the last state value should be update using the MC method:
updates[trajectory_[i]] += rewards_[i] - current_values[trajectory_[i]]
and I think the last state value updating of TD(0) method should be the same with MC method.
Thank you very much if you could answer my question. I am a new learner of RL, your code help me a lot. Thank you!
The value of a terminal state (last state of an episode) is always 0. Feel free to correct me if I misunderstood your question.