ShangtongZhang/reinforcement-learning-an-introduction

Problem I meet in how TD method and MC method update the last state-value in a MRP

xingE650 opened this issue · 1 comments

This is my first time use github issue, please forgive me if I offend you. In Chapter 06, Example 6.4, when I try to use the batch updating code(batch_updating(method, episodes, alpha=0.001) at 132 row) in random_walk.py to solve it, I find the value of the last state in an episode can not be updated, because of the following code:

for i in range(0, len(trajectory_) - 1):

at 153 row in the chapter06/random_walk.py file.

I think the last state value should be update using the MC method:

updates[trajectory_[i]] += rewards_[i] - current_values[trajectory_[i]]

and I think the last state value updating of TD(0) method should be the same with MC method.

Thank you very much if you could answer my question. I am a new learner of RL, your code help me a lot. Thank you!

The value of a terminal state (last state of an episode) is always 0. Feel free to correct me if I misunderstood your question.