DQN_Cartpole

Refer to https://zhuanlan.zhihu.com/p/21477488

(1)动作响应:

  • state + action = reward + next_state

(2)DQN

  • input: state, action
  • output: reward + γ•max(Q(next state, all actions))
  • Q_action为动作价值函数