cvxgrp/cvxpylayers

Can one build a neural network whose parameters act a role of "Parameter"s in cvxpy?

JinraeKim opened this issue · 6 comments

Hi,

I'd like to know whether it's possible to build a neural network (especially with PyTorch) whose network parameters act a role of "Parameter"s in cvxpy.

For example, consider a neural network f(x, u, theta) where x and u are condition and decision variables. Suppose that f is constructed to be convex in u, and it contains some standard deep neural networks internally, whose parameter is theta.

It is easy to build such neural network with e.g. PyTorch. However, I'd like to "backpropagate" the gradient of u_optimal, which is argmin of f(x, \cdot, theta), wrt theta.

Of course, in principle it seems possible to do this with cvxpylayers.
However, I cannot find an example or working example for this. It's a bit tricky for me.

Is there anyone who has a good idea?

Yes, something like that is possible. The only constraint is that the parameters have to enter the CVXPY problem in a DPP compliant example. Sometimes this can be a bit tricky, as you mentioned.

Have you looked at the examples in the Github repo? Your application sounds more complicated than our examples, but they might at least point you in the right direction.

@akshayka Thank you.
I've seen the examples :)
One way that I found would be as follows.

  1. Construct a neural network with some predefined layers, e.g., nn.Linear and F.relu.
  2. Write custom cvxpy functions, corresponding to the predefined layers. For example,
def relu_cp(x):
    return cp.pos(x)

would (obviously) correspond to F.relu.
3) Construct a cvxpylayer with the custom cvxpy functions as the NN in 1) (if possible).
4) Extract torch parameters and inject them into the cvxpylayer for backpropagation.

@akshayka Thanks. I've tried it and it works (if it satisfies the DPP ruleset).
It is not that general to meet the DPP-compliant condition though.

Unfortunately, it seems not to satisfy the DPP ruleset.

I think I find a remedy.

I'll describe my problem more precisely and the remedy.

My problem is to differentiate the solution u to a given convex optimisation problem with respect to a network parameter theta.

In my case, the objective function of the convex optimisation problem is expressed as
J = logsumexp(f(x; theta).T @ u)
where u is the decision variable (optimisation variable), x is a given condition variable.
Since f is not affine in theta, I thought it's not DPP-compliant, and therefore it is not applicable for cvxpylayers.

However, we can replace f as A, a matrix. Then,

J = logsumexp(A.T @ u) becomes DPP.
Also, for a loss function l, we can get the derivative wrt theta as
dl/dtheta = dl/ dA * df/dtheta.

Let's take a look at the PyTorch example on the banner.

import cvxpy as cp
import torch
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tch = torch.randn(m, n, requires_grad=True)
b_tch = torch.randn(m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)

# compute the gradient of the sum of the solution with respect to A, b
solution.sum().backward()

If we inject f(x; theta) as an argument of cvxpylayer, it would backpropagate to theta, not just A.

I'll write some test scripts to make sure it.

If we inject f(x; theta) as an argument of cvxpylayer, it would backpropagate to theta, not just A.

Yes, that's the right idea! Non-DPP transformations of data should happen in PyTorch, and then fed to the layer as the value of a parameter.