ikostrikov/pytorch-ddpg-naf

Typo in the Implementation

Akella17 opened this issue · 1 comments

expected_state_action_values = reward_batch + (self.gamma * mask_batch + next_state_values)

current Target: r_t + \gamma * mask + v_{t+1}
correct Target: r_t + \gamma * mask * v_{t+1}

Fixed in d900aa6