JuliaLang/julia

Late-evaluating method definitions

timholy opened this issue · 9 comments

In the process of introducing Cartesian into base, I found myself wishing for a notation like this:

function getindex(A::AbstractArray, I::Int...N)
    <code>
end

The idea here is that the last argument is not really a splat (and doesn't create a tuple); it's a promise to the compiler that the method will be created for any fixed N, for example a method for N=3 of

getindex(A::AbstractArray, I_1::Int, I_2::Int, I_3::Int)

This might help us get away from needing wrapper functions that use a dictionary _getindex_cache to implement multidimensional methods.

A more complicated example would be auto-generating the slice functions described in #5394: in the absence of a late-evaluating method definition, to support slices where P is the dimensionality of the parent and C of the child would require binomial(P,C) method definitions. Of course, users are likely to need only a tiny fraction of these in any given session. The notation here might be similar,

getindex{T,A,I<:(RangeIndex...P)}(s::SubArray{T,C, A, I}, I::Int...C)

Note that there are two late-evaluating parameters in this case, P and C.

I'm not sure that the notation ...N covers all the potential uses of this idea, but it appears to be a start.

This may be mostly or entirely redundant with #3706, and if so feel free to close this.

Would this be addressed just by making varargs functions faster and making it possible (or the default) to specialize them for any number of arguments?

It would cover many usages, but I'm not sure it would cover all. For example, the suggestion in #5394 appears to require some way of generating

s.parent[s.indexes[1], s.indexes[2][i]]

when s is a SubArray{T,1,A,(Int,Range1{Int}) but

s.parent[s.indexes[1][i], s.indexes[2]]

when s is a SubArray{T,1,A,(Range1{Int},Int) (note that the indexes parameters have been swapped). In other words, the body of the method definition might need to be a code-generator that knows the types of the inputs.

The T...N may be a red herring. Perhaps a better way is to introduce the notion of a latefunction, with the following rules:

  • They return an Expr
  • The Expr serves as the body for a function with signature equal to typeof(args...), where args is the tuple of all arguments passed from the caller
  • The resulting function is compiled and added to the method table.
  • The new method is used to evaluate all calls (this one, and all future ones) with that specific signature.

The only advantage of the T...N syntax is it shows which parts of the method signature will require a new method to be emitted; however, if compilation is the slow part anyway, perhaps one wouldn't mind having a new Expr emitted for each slight difference in input types (e.g., Int32 vs Int64, rather than worrying about generating a method signature containing generic Integers).

Here's a complete example for the "difficult" case of #5394:

latefunction getindex{T,A,I<:(RangeIndex...)}(s::SubArray{T, N, A, I}, J::Real...)
    args = {}
    i = 1  # index into I
    j = 1  # index into J
    for i = 1:length(I)
        if I[i] <: Integer
            push!(args, :($s.indexes[$i]))
        else
            Jj = Expr(:quote, J[j])
            push!(args, :($s.indexes[$i][$Jj]))
            j += 1
        end
    end
    # Trailing 1s, etc...
    for j = j+1:length(J)
        Jj = Expr(:quote, J[j])
        push!(args, :($Jj))
    end
    Expr(:ref, s.parent, args...)
end

Aha! In the past I have called this a "staged method" (though not everybody loves that term). There is a demo implementation here:

https://github.com/JuliaLang/julia/blob/master/examples/staged.jl

+1 for staged functions.
For advanced usage it would be good to have a way to specify a more general method signature that the newly generated method should get (for efficiency). The only trouble with this is that the user has to make sure that all invocations covered by that signature will produce it, for consistency and to avoid ambiguity warnings. It might make sense to divide the process into one function that generalizes the method signature, and another that produces the method body based on the signature (the part that you already get with staged.jl), since it the latter must only depend on the signature that is actually used.

Right, I belatedly realized that I should have used that term. The main problem is that, if I understand correctly, there's no good way to generate a staged function on demand, and that's really what this issue is focused on. I'm trying to deal with the combinatorial complexity of something like #5394, where C-dimensional slices from a P-dimensional array require binomial(P,C) methods, and of course P and C can be arbitrary; creating all possible methods for arrays up to P dimensions therefore requires 2^P-1 methods, and if you pre-generate all methods up to dimension P what do you do if the user passes a P+1 dimensional array? I'm well aware of, and use, the traditional _mymethod_cache = Dict() option---indeed the new @ngenerate macro builds such wrapper functions automatically---but this approach has negatives that I'm sure you're well aware of.

However, I realize that there is a possible solution that someone may already be working on, which turns out to be #265. Perhaps we should continue this discussion there.

A "real" implementation of staged methods would indeed hook into julia's method tables. Where we currently call type inference to transform the function and cache the result, we'd first call the user-written transformation. This can be thought of as writing a compiler extension; everything else in the language stays the same but the compiler gets a bit smarter.

This would take Julia's meta-programming capability to a new height.