StanfordASL/neural-network-lyapunov

Better understanding of MILP optimization

AlessiaLiNoce opened this issue · 5 comments

Hi, I'm trying to understand the starting conditions for the state in the derivative and positive MILP formulation. In the default training, I give as input only an empty vector, so which are the values used in the MILP model? Are they random values inside the safe box [x_up,x_lo]? Or maybe a default value defined by Gurobi itself?
immagine

The values of the state are un-initialized by default in our code. Internally Gurobi probably initialize them to certain values. See Gurobi's documentation on start. In mixed-integer optimization, this warm-start should not affect the final solution you get, if the optimizer obtains the global optimal, but it can affect the solver speed. So sometimes it is beneficial to initialize the variables to speed up the Gurobi solver.

Our API allows initializing the state to certain values. You could check x_warmstart argument in

and

Thanks for the explanation. Just a last question: what are the differences between the train() and train_lyapunov_on_samples()? They both give Lyapunov function and controller with lyapunov condition satisfied,right?

They both give Lyapunov function and controller with lyapunov condition satisfied,right?

Right.

We implemented two algorithms, see our paper https://arxiv.org/pdf/2109.14152.pdf. train_lyapunov_on_samples() is for Algorithm 1 in that paper, and train() is for Algorithm 2 in that paper.

In both algorithms, we first solve an MIP to find the worst adversarial states. In Algorithm 1, we append the adversarial states to the training data set, then apply batched gradient descent on that training data set to fix the violation. The hope is that by fixing the violation on the training data set (with finite number of states), we can reduce the violation on all states (with uncountably infinitely many states) to zero. In Algorithm 2, we find the gradient of the maximal violation of the inner MIP problem, and then just call gradient-descent; there is not a training dataset in Algorithm 2.

Empirically, we find that algorithm 1 works better for dynamical system with small dimension (like a pendulum with 2 states), while algorithm 2 works better for dynamical systems with larger dimension (like a 3D quadrotor with 12 states).

Okay, thanks for all the explanations. I will close this issue.