QuantEcon/QuantEcon.jl

gridmake speed

Opened this issue · 8 comments

albop commented

I have tried to rewrite a version of gridmake, that would output a column-major ordering. The following function does it:

function ggmake(nodes...) 
       res = [Base.product(nodes...)...]
       reinterpret(Float64, res, (length(res[1]), length(res)))
end

Nice thing about the product generator is it works with any kind of argument as long as it is an iterable collection of floats. For instance ggmake((1.,2.),[0.1,0.2], linspace(0,1,100)) is perfectly valid.

Then I was surprised to see it was significantly faster than the one shipped in Quantecon:

julia> a1, a2, a3 = randn(10), randn(10), randn(10);
julia> @btime ggmake(a1,a2,a3);
  26.409 μs (1016 allocations: 71.22 KiB)
julia> @btime gridmake(a1,a2,a3);
  1.267 ms (11501 allocations: 289.47 KiB)

This is true even if one adds a transposition at the end to replicate hte output of the current gridmake.

@btime ggmake(a1,a2,a3)';
  34.871 μs (1018 allocations: 94.73 KiB)

That makes me wonder, given that the code in https://github.com/QuantEcon/QuantEcon.jl/blob/master/src/util.jl is a fair bit more involved.

Also, the timings in #104 don't seem consistent with the current measurement. Is there a regression ?

albop commented

Like always, I sent the comment before editing it completely. I just finished it.

Wow, this is great. Faster and simpler. I'm all for this change.

What should the API look like for choosing tall vs wide output (they are both column major IMO because even in the tall form the first column's values move faster)?

We could have two functions with slightly different names, or we could have a Bool keyword argument. I'm open to suggestions.

I may have been a bit too eager...

Two things to be aware of:

  1. We shouldn't hard code Float64
  2. We need to preserve the current functionality when using AbstractMatrix as an input (see below)
julia> x = [1.0 2.0; 3.0 4.0];

julia> y = [100.0; 200.0]
2-element Array{Float64,1}:
 100.0
 200.0

julia> gridmake(x, y)
4×3 Array{Float64,2}:
 1.0  2.0  100.0
 3.0  4.0  100.0
 1.0  2.0  200.0
 3.0  4.0  200.0

julia> ggmake(x, y)'
8×2 Array{Float64,2}:
 1.0  100.0
 3.0  100.0
 2.0  100.0
 4.0  100.0
 1.0  200.0
 3.0  200.0
 2.0  200.0
 4.0  200.0
albop commented
  1. indeed. There is still the constraint that data must be homogenous. It's a tricky one, since there is no abstract iterable type. One could use AbstractArray, but that is less general. Don't know what is meant by "traits" in this discussion: JuliaLang/julia#23429
  2. I haven't thought carefully about that one. But maybe it isn't that useful, if not used for recursively defining cartesian products.
    As for the transposition, it seems to me, gridmake has a distinctive compecon connotation, and it would be confusing to have it return a column-major result. and gridmake(..., true) to have Julia-friendly ordering would seem strange to me (since it should be the default.

This may be related or may be orthogonal: Yesterday I was wanting gridmake(T::Type, arrays...) to force the eltype of the output to be Float64 independent of the eltype of arrays (one could use gridmake!(out, arrays...) by calculating the shape of out from arrays, but it would be redundant, being done internally in gridmake).

I tried a few approaches and I think the mapreduce(f, hcat, zip(arrays, cumprod)) version offered same order of efficiency and simpler code.

@Nosferican can you point us to the mapreduce version?

I think it goes something like

using BenchmarkTools: @btime

const x, y, z = 1:3, [10 20; 30 40], [100, 200];

arrays = [x, y, z];

function gridmake(arrays::AbstractVecOrMat...)
    l = size.(arrays, 1)
    nrows = prod(l)
    output = mapreduce(a_o -> repeat(a_o[1],
                                     inner = (a_o[2], 1),
                                     outer = (div(nrows, size(a_o[1], 1) * a_o[2]), 1)),
                       hcat,
                       zip(arrays, cumprod(prepend!(collect(l[1:end - 1]), 1))))
    return output
end

@btime gridmake($arrays...)
9.045 μs (99 allocations: 5.84 KiB)

The original implementation benchmarked in that case 8.516 μs (43 allocations: 2.36 KiB).
I also tried using Base.product and Base.Iterators.flatten which gives a one liner, but it was less efficient. Those were benchmarked on v"1.0.0".