Menglinucas/DQN_Cartpole

Python

DQN_Cartpole

Refer to https://zhuanlan.zhihu.com/p/21477488

（1）动作响应:

state + action = reward + next_state

（2）DQN

input: state, action
output: reward + γ•max(Q(next state, all actions))
Q_action为动作价值函数