jump-dev/MathOptFormat

Support parameters

Closed this issue · 14 comments

odow commented

From Brad Bell of CppAD:

The CppAD expression trees represent a non-linear function with a set of independent and dependent
variables. In addition they have constant and dynamic parameters. A constant parameter cannot
change. A dynamic parameter that can change, but no derivatives are computed with respect to it. A
variable can change and derivatives are computed w.r.t. it. The dynamic parameters are placed in the
same address space as the constant parameters. The variables are in a separate address space that is
used for all the forward and reverse mode derivatives (starting with order zero).

Do you think the MathOptFormat will support the representation of
non-linear functions (as described above) as some substructure ?

I believe all we need to do is define:

{"head": "parameter", "name": "my_unique_name", "default": 1.0}

This is something that will be very useful in some work I am about to propose: how to record parameterized optimization models. Provided the structure remains the same, these parameters would be the ticket.

I am not sure what you mean by 'parameter' here. CppAD uses the following notation:

  1. A constant parameter, this is a value that will never change for the life of this graph.
  2. Dynamic parameters, there are independent dynamic parameters and once they are determined
    the value of the other dynamic parameter are determined by the graph. No derivatives are determined for the dynamic parameters.
  3. Variables, there are independent variables, once these variables, and all the dynamic parameters, are determined, the value of the other variables, and their derivatives, can be determined.
odow commented

The code currently in odow/MathOptFormat.jl#26 already covers cases (1) and (3):

  1. The leaf node: {"head": "real", "value": 1.0}
  2. ???
  3. The leaf node: {"head": "variable", "name": "x"}

The proposal above is for your case (2). It has a name (which is in a different namespace to the variables) and a default value. We will also need to define how to specify different values for the parameters.

One wrinkle for proposing "head": "real" instead of "head": "constant" is that we also want to support complex numbers. So maybe we have two types of dynamic parameters:

{
    "head": "real_parameter", 
    "name": "unique_parameter_name",
    "default_value": 1.0
}

{
    "head": "complex_parameter", 
    "name": "unique_parameter_name_2", 
    "real": 1.0, 
    "imag": 1.0
}

(Open to naming suggestions.)

This brings up an important consideration when doing AD of complex functions:

There is a choice to represent it as a pair of real functions, or as one complex function.

As a complex function, the complex conjugate is not differentiable, but it is differentiable as a pair of real functions.

odow commented

This brings up an important consideration when doing AD of complex functions:

I guess a decision to make is whether the format is immediately amenable to AD, or whether it just records an expression graph in a compact manner. I tend to favor the second as one could imagine developing solvers that natively understand complex numbers. The structure of the model is "this is a complex number." It is up to the solver/AD code to make any transformations (e.g., converting to polar form) necessary in order to solve it.

I think it would be best to have an abstract type, say 'Float' that satisfies certain properties and is not necessarily real or complex. In fact, it is useful in AD to use the same graph for different types.

There is another issue with the CppAD computational graphs. It is possible to designate a certain functions as atomic; i.e., its sub-graph is kept separate and does not get expanded as part of the current graph. This has certain advantages and disadvantages. For example, common sub-expressions in the sub-graph and the parent graph cannot be recognized.

There needs to be a data structure for this representation that is language independent, so that python, julia, or C++ routines can pass this information without having the overhead of file I/O.

odow commented

I think it would be best to have an abstract type, say 'Float' that satisfies certain properties and is not necessarily real or complex. In fact, it is useful in AD to use the same graph for different types.

If the file format can parse numbers back to their proper type, sure. But, for example, while JSON can parse "value": 1 and "value": 1.0 back to ints and floats, it doesn't have the same support for complex numbers.

There is another issue with the CppAD computational graphs. It is possible to designate a certain functions as atomic; i.e., its sub-graph is kept separate and does not get expanded as part of the current graph. This has certain advantages and disadvantages. For example, common sub-expressions in the sub-graph and the parent graph cannot be recognized.

Currently the design treats every function within a model atomically. But we probably want to store the list of nodes in the graph in some common location so sub-expressions in different constraints can be recognised. Ref: odow/MathOptFormat.jl#26 (comment)

There needs to be a data structure for this representation that is language independent, so that python, julia, or C++ routines can pass this information without having the overhead of file I/O.

Agree. But we aren't defining our own binary format. We need to find an efficient, widely supported, existing format. Ref: https://github.com/odow/MathOptFormat.jl/issues/6#issuecomment-436797993

Is there an advantage of keeping track of different types in the non-linear functgion, instead of doing coercion to one floating point type of the space that the function is mapping. For examole if f: R^n -> R^m would we use a different type for each element of the domain and range in graph representation ? I think this would get complicated and it would be better to choose one type for the function.

Something to think about here is should the representation include only floating point operations, like an AD tape, or should it represent the algorithm; e.g., should it include things like:
if expression then statement_list else statement_list

odow commented

Coming back to this because I want to move forward with a stochastic file format and parameters are needed. cc @joaquimg

In addition, @tomasfmg is exploring parameters in MOI here: https://github.com/tomasfmg/ParametricOptInterface.jl

Design options

Option 1: Parameters are just special variables

Pros:

  • Easy to implement.
  • Logical extension from MOI

Cons:

  • Parameters in variable bounds cause problems:
model = Model()
@parameter(model, p == 1)
@variable(model, x >= 2 + p)
# really interpreted as
@variable(model, x)
@constraint(model, x - p >= 2)

Could work around this by always inferring constraints as variable bounds if possible.

  1. Process all SingleVariable-in-Set constraints
  2. If a ScalarAffineFunction-in-Set constraint exists with a single (non-parametric) variable with a coefficient of 1.0 and when interpreted as a bound is still valid, then add as a bound.

Option 1a: Parameter set

Define a new Parameter set. Parameters are just variables with the constraint

{
    "function": {
        "head": "SingleVariable", "variable": "p"
    },
    "set": {
        "head": "Parameter", "default": 1.0
    }
}

Option 1b: Parameter attribute

Define a parameter_value attribute:

"variables": [
    {"name": "x"}, 
    {"name": "p", "parameter_value": 1.0}
]

Or

"variables": [
    {"head": "variable", "name": "x"}, 
    {"head": "parameter", "name": "p", "default": 1.0}
]

Option 2: Separate parameters

"variables": [{"name": "x"}],
"parameters": [{"name": "p", "default": 1.0}]

Ideally, these parameters are then accepted anywhere that a real scalar is accepted.

Pros:

  • Explicit.

Cons:

  • We don't have a concept for this in MOI.
  • When parsing, we have to check if every number is either a number or a struct.
  • What is I want to write the following? Can we have expressions of numbers and parameters?
model = Model()
@parameter(model, p == 1)
@variable(model, x >= 2 + p)
odow commented

https://odow.github.io/StochOptFormat/ has taken the view that parameters are just variables. We give a list of the parameters (random variables) in the outer layer of metadata.

We can probably make this work, with extra code in the future:

model = Model()
@parameter(model, p == 1)
@variable(model, x >= 2 + p)

even in option 1

odow commented

I don't think we're going to do this for now. It's a tricky thing to get right.