solution for pendulum on cart using DDP
innChow opened this issue · 6 comments
@baggepinnen hi there! you have mentioned a solution for pendulum on cart using DDP. But I am not familiar with julia, so I can not read the code. I am struggling with the problem, still no idea how DDP algorithm can solve it when control torque is limited. Would you mind sharing some reference materials or giving me some hints directly. Thanks a lot!
Sure, have a look at the following article which describes the algorithm for box constrained DDP
https://www.google.com/url?sa=t&source=web&rct=j&url=https://homes.cs.washington.edu/~todorov/papers/TassaICRA14.pdf&ved=2ahUKEwju16fi85jmAhVWWysKHYjjCjoQFjAAegQIAhAB&usg=AOvVaw1uX4lP3YolDvLBRSFp9iJd
If there is no issue with this repository, feel free to close this issue. You can continue commenting here if you have further questions!
So, you did the same projected-newton QP optimization as the paper presents, and it works to cartpole problem with control constraints? I am confused by such a question---in order to erect the pole by pushing the cart, the cart has to be back and forth to accumulate momentum of the pole, which means the pole will be away from the target sometimes. However, the algorithm tries to minimize cost at each backprop step, forcing the pole moves towards the target... Is it some sort of contradictory?
To be clear, which example are you talking about?
In any case, the optimization algorithm is optimizing the control signal trajectory to minimize the cost over the entire horizon, it does not greedily choose an action that is best for a single time point/
there are some details I don't understand with regard to the paper you recommended. DDP typically solves an optimal problem <min Q(dx, du)>to get control gains k and K. Then get the derivative Vx and second derivative Vxx. As for a constraint case, the optimal problem turns to <min Q(dx, du) s.t. some constraints>, how did he get k and K, and also Vx and Vxx? I didn't figure it out that what function the algorithm at appendix has.
Solving the box-constrained QP problem is what the algorithm in the appendix does. The implementation of it is here
https://github.com/baggepinnen/DifferentialDynamicProgramming.jl/blob/master/src/boxQP.jl
This routine is called here and just below that is the calculation of the value-function matrices.