deep_q_network Implement Deep Q Network with a simple pendulum environment, where the joint angle and velocity are continuous and torque is discretized. Result of Double-pendulum Training Average cost-to-go Rendering