cvxgrp/cvxpylayers

Question: Different results for Boxconstraints with torch.clamp vs cvxpy layers

philippkiesling opened this issue · 0 comments

Hi everybody,

I am training a reinforcement learning agent with one dimensional continuous action space. I want to restrict the action space, such that the agent can only choose valid actions (This constraint changes depending on the current state).

Therefore I tried two approaches to limiting the action space:

To my surprise, however, they yield completely different results: The torch.clamp version does not converge to a good solution, while the cvxpylayer does.
Is there any explanation for this?

All the best,
Philipp

The gradient of both functions

Forward pass of both functions