Comparison with SparseAxisArray
Opened this issue · 6 comments
I was interested in how we compare with the default SparseAxisArray in JuMP.
A naive implementation, using flow = m[:flow]
is terrible, because flow
can't be typed correctly. If we use a function barrier, it's still a problem, because the in-liner removes the function barrier for some reason. But if you use @noinline
, then things are nicer. (It's also the reason for the 8 sec in "Test dictionary".)
The filter introduces some function call overhead, and the alternative is slightly cheating, because it reduces to a single iteration through the keys of flow
.
julia> test()
-- Test SparseAxisArray --
Variable creation: 0.027661869 [nv=2160]
Constraint creation: 0.519338562 [nc=9093]
-- Test dictionary --
Variable creation: 0.005746578 [nv=2160]
Constraint creation: 7.898748765 [nc=9093]
-- Test dictionary filter--
Variable creation: 0.005215764 [nv=2160]
Constraint creation: 0.705254153 [nc=9093]
-- Test dictionary alternative --
Variable creation: 0.005294635 [nv=2160]
Constraint creation: 0.013705239 [nc=9093]
-- Test sparse array slice--
Variable creation: 0.004835772 [nv=2160]
Constraint creation: 0.059691706 [nc=9093]
-- Test sparse array select --
Variable creation: 0.005393562 [nv=2160]
Constraint creation: 0.593879205 [nc=9093]
-- Test sparse array with cache --
Variable creation: 0.005129799 [nv=2160]
Constraint creation: 0.039051761 [nc=9093]
-- Test sparse array with cache (check empty) --
Variable creation: 0.005233172 [nv=2160]
Constraint creation: 0.037110871 [nc=2563]
-- Test indexed table --
Variable creation: 0.006558805 [nv=2160]
Constraint creation: 0.008344676 [nc=2482]
So the most interesting target is the sparse slice.
function create_vars_SparseAxisArray(m, pp)
return @variable(
m,
flow[
f = pp.factories,
c = pp.customers,
p = pp.products,
t = pp.periods;
haskey(pp.demand, (c, p, t)) && canproduce(pp, f, p)
] >= 0,
)
end
function create_constraints_SparseAxisArray(m, pp)
flow = m[:flow]
_create_constraints_SparseAxisArray(m, pp, flow)
end
@noinline function _create_constraints_SparseAxisArray(m, pp, flow)
# Production capacity
for (f, t) in keys(pp.prodcap)
@constraint(
m,
sum(
flow[f, c, p, t] for
(ff, c, p, tt) in eachindex(flow) if ff == f && tt == t
) ≤ pp.prodcap[f, t]
)
end
# Customer demand
for (c, p, t) in keys(pp.demand)
@constraint(
m,
sum(
flow[f, c, p, t] for
(f, cc, pp, tt) in eachindex(flow) if cc == c && pp == p && tt == t
) ≤ pp.demand[c, p, t]
)
end
# Transport capacity
for (f, c) in keys(pp.flowcap), t in pp.periods
@constraint(
m,
sum(
flow[f, c, p, t] for
(ff, cc, p, tt) in eachindex(flow) if ff == f && cc == c && tt == t
) ≤ pp.flowcap[f, c]
)
end
end
function test_SparseAxisArray(pp)
println("-- Test SparseAxisArray --")
m = Model()
t1 = @elapsed create_vars_SparseAxisArray(m, pp)
println("Variable creation: $t1 [nv=$(num_variables(m))]")
t2 = @elapsed create_constraints_SparseAxisArray(m, pp)
println(
"Constraint creation: $t2 [nc=$(num_constraints(m, AffExpr, MOI.LessThan{Float64}))]",
)
return println()
end
Ah. One reason for the fast slice is that slicing returns a Vector
, which is only reasonable if you want to call sum
on it directly. We'd lose indexing more generally.
julia> flow[5, :, 7, 5]
6-element Vector{VariableRef}:
flow[5, 32, 7, 5]
flow[5, 71, 7, 5]
flow[5, 8, 7, 5]
flow[5, 93, 7, 5]
flow[5, 56, 7, 5]
flow[5, 41, 7, 5]
julia> flow[5, :, :, 5]
12-element Vector{VariableRef}:
flow[5, 37, 16, 5]
flow[5, 32, 7, 5]
flow[5, 71, 7, 5]
flow[5, 31, 16, 5]
flow[5, 29, 16, 5]
flow[5, 8, 7, 5]
flow[5, 82, 16, 5]
flow[5, 93, 7, 5]
flow[5, 56, 7, 5]
flow[5, 40, 16, 5]
flow[5, 93, 16, 5]
flow[5, 41, 7, 5]
Thanks for looking into the details. We have also been looking a bit into SparseAxisArray as an alternative. The speed-up we get over SparseAxisArray is in my view due to two main factors:
- The construction of sparse variables with multiple indices in SparseAxisArray is inefficient due to the fact that you can only filter on conditions after looping through all index sets. Scaling up the problem sizes we ran into considerable time issues.
- I think the main contribution to speedup in the slicing, is not the returning of a Vector, but that we perform quite aggressive caching during slicing. I.e. if you slice on one or more indices, we assume that there will be more slicing on the same combination of indices and store a lookup table for all other values of indices not being sliced upon.
You can see the effect of caching by comparing the timing of these two tests that are doing the same without and with caching.
-- Test sparse array select --
Variable creation: 0.005393562 [nv=2160]
Constraint creation: 0.593879205 [nc=9093]
-- Test sparse array with cache --
Variable creation: 0.005129799 [nv=2160]
Constraint creation: 0.039051761 [nc=9093]
If we are to consider JuMP in some of our industrial settings with large LP problems and hard time requirements, the setup time of the model is an important issue (e.g. problems with a few millions variables, constraints and non-zeros that are required to be run every five minutes). In these settings, model creation can typically take up to 50 % of the time even on finely tuned models in FICO Xpress Mosel.
Hmm. Yeah the caching is an interesting optimization. I don't know if it's something we'd want to add by default in JuMP because it could lead to weird memory issues on unsuspecting users.
And for the hard time requirements: couldn't you build once in Xpress and then modify subsequent problems? I assume a lot of the structure stays the same, and it's just RHSs changing?
I think most commercial systems use some form of caching to speed up constraint generation. If you have a look at the docs of e.g. the Python API of Gurobi, you will find they have specialized tupledict/tuplelist with some internal data structures to be used for efficient select procedures:
https://www.gurobi.com/documentation/9.5/refman/py_tupledict.html
To some extent we can modify problems, but at some stage it is more work to keep track of modifications than to just build the model from scratch. In these dynamic supply chain problems you get new and modified orders that will affect a lot of the structure also in the constraint matrix.
Thanks for following up on this @odow !
I can add to @trulsf 's first point that I had a look into sparse variable construction a while ago, and it is possible to do more efficiently, but very inconvenient (https://github.com/hellemo/SparseHelper.jl). Incremental variable construction like with insertvar!
has the added benefit of opening for new ways to write modular code (delegating responsibility to add variables).
I've considered looking into wrapping the result of a slicing operation in a new type (SubSparseVar?) which could be used for further indexing, or perhaps for specialized sum or other methods.
A quick performance update:
We have updated the benchmarks from the talk with a new type IndexedVarArray
in v0.6.2, which in addition to adding some new functionality largely closes the performance gap to the incremental model building (model_indexed
). This is the same caching idea with a bit more efficient implementation.
We also included the newly added support for slicing of SparseAxisArray
to the benchmark (model_sparse_aa
). Unfortunately, it seems like the repeated construction of Dicts for each lookup/slicing gives some performance issues. In fact we had to reduce the problem size to make the benchmark complete in a reasonable time:
https://github.com/hellemo/SparseVariables.jl/blob/main/benchmark/res.svg
https://github.com/hellemo/SparseVariables.jl/blob/main/benchmark/sparsity.svg