JuliaLang/julia

NamedTuple unpacking (and also struct unpacking)

mauro3 opened this issue ยท 33 comments

During JuliaCon discussion with @ahojukka5, I figured it would be cool to have NamedTuple unpacking in function arguments which honors the names of the named tuple. Currently we have:

julia> nt = (a=1, b=2)

julia> f((b,)) = b                                                                                                                                        
f (generic function with 1 method)                                                                                                                          

julia> f(nt)                                                                                                                                                 
1                                                                                                                                                            

instead, I'd think this would be more intuitive:

julia> f(nt)                                                                                                                                                 
2

Similarly it could work for structs:

julia> struct A; a; b end                                                                                                                                    
                                                                                                                                                             
julia> aa = A(1,2)                                                                                                                                            
A(1, 2)                                                                                                                                                      
                                                                                                                                                             
julia> f(aa)                                                                                                                                                  
2

(Unfortunately, this needs a 2.0 label)

Edit: fixed unneeded/confusing default-value method-defs.

(b,) already means positional unpacking, so the syntax would have to be (; b,) or (b=b,).

That would then look:

 f((;a, )) = 2a
# or
f(x, y, (;a, b)) = x+y+a+b

Could be good. The only potential problem could be that it would be easy to miss the ; and then silently do positional unpacking.

The (b=b,) syntax seems a bit verbose.

As a bonus, this would be non-breaking.

Edit: fixed unneeded/confusing default-value method-defs.

The (b=b,) syntax seems a bit verbose.

No more verbose than the calling syntax and it would allow restructuring the named value to a variable with a different name. It's currently pretty common to see stuff like this:

something = ...
f(a, b, c; something=something)

If we had a shorthand for the case where the local name matches the keyword name then it would also make sense to use that same syntax for destructuring when the names match. Spitballing, it could be:

f(a, b, c, =something)
f(a, b, c, something=)

I think I prefer the former.

It has been proposed before (I believe by @davidanthoff ) to make (; x) shorthand for (; x = x). In fact there is commented-out code for it in julia-syntax.scm. Also (; a.x) can be shorthand for (; x = a.x). These are currently syntax errors.

I use myfuction(x; keyword::T) .. myfunction_doingless(y, keyword=keyword) .. end with frequency.

when we are free to use some unicode as shorthand

xโฅ โ‰ x = x

(with frequency .. about 80GHz today)

Also (; a.x) can be shorthand for (; x = a.x).

How would it be known what a is when the function is called?

IMO, this feature is orthogonal to kwargs. It makes sense to name-unpack the fields of a positional argument (using the syntax of #28579 (comment)):

f(x, y, (;a, b); kw1=1, kw2=y) = ...

f(1,2, (a=3, b=8, u=9))

and also name-unpack a kwarg:

g(x, y; kw1=1, (;a, b)=p, kw2=y) = ...

g(1,2, p=(a=7, b=9, u=8))

Thus a syntax working for both should be found. I think above seems reasonable.

tpapp commented

Somewhat related: I am experimenting with a package called EponymTuples.jl, which allows replacing

f((a, b)::NamedTuple{(:a, :b), <: Tuple{Any, Int}}) = ...

(a = a, b = b, c = 3)

with

f(@eponymargs(a, b::Int)) = ...

@eponymtuple(a, b, c = 3)

I find it helpful for cases when I don't want to introduce and name a struct for passing around a large number of parameters.

I happily noticed recently in the News.md for 1.5 that #34331 was merged, which I find super useful.

At this point the language seems perfectly set up to implement the Issue here, with the syntax being that a named tuple on the LHS means unpacking into those names. It gives a great symmetry between packing/unpacking named/unnamed arguments:

# pack named or unnamed arguments
foo(x,y)
foo(;x,y)

# unpack named or unnamed arguments
(x,y) = bar()
(;x,y) = bar()

If getproperty is used to do the unpacking, then it works for structs and NamedTuples and is type stable, and you can always do (;x,y) = (;dict_like...) for other stuff. Or a separate interface can be defined like @mauro3's @unpack.

A nice testament to the consistency of all of this is that you would have that both of these are valid and a no-op as you'd expect:

# valid but no-op
(x,y) = (x,y)
(;x,y) = (;x,y)

Would be great if something like this could be considered.

Agreed. I'm particularly interested in the destructuring of structs and named tuples in function arguments. If we had that feature, then we would have ~70% of the pattern matching capabilities of ML languages. The only parts we would be missing are

  • pattern matching on values rather than types
  • destructuring a collection into the head and tail of the collection (useful for recursion)

Someone show me exactly what should happen given what?

Someone show me exactly what should happen given what?

Good question. The original post seems to focus on unpacking in function arguments, which is the part I'm most interested in. So, the syntax might look like the following:

Named tuple unpacking in positional function arguments

# define
foo(q, (; x, z)) = q + x + z

# call
foo(1, (x=2, y=3, z=4))  # returns 7

Struct unpacking in positional function arguments

I think struct unpacking in function arguments requires a separate syntax. Perhaps something like this:

struct A
    x
    y
    z
end

# define
bar(q, A x z) = q + x + z

# call
a = A(2, 3, 4)
bar(1, a)  # returns 7

@JeffreySarnoff, the idea is like this:

struct T
    x::Int
    y::Float64
end

t = T(10, 3.14)
nt = (a=1, b=2, c=3)

(;x, y) = t
(;a, b, c) = nt

so the (;x, y) syntax on the LHS is a way to declare/make several variables that get their value by calling getproperty(x, :sym) on the RHS.

Javascript has this (in recent versions).

@CameronBieganek just want to highlight that

I think struct unpacking in function arguments requires a separate syntax.

doesn't need to be the case.

Since Julia already lowers argument unpacking from

foo((x,y),) = ...

to something like

foo(tmp) = ((x,y) = tmp; ...)

then it would be natural that foo((;x,y),) = ... became foo(tmp) = ((;x,y) = tmp; ...), and then if (;x,y) were allowed on the LHS with the meaning proposed above, then your struct unpacking into positional arguments would work exactly right.

then it would be natural that foo((;x,y),) = ... became foo(tmp) = ((;x,y) = tmp; ...), and then if (;x,y) were allowed on the LHS with the meaning proposed above, then your struct unpacking into positional arguments would work exactly right.

@marius311 That's interesting, but I would want

foo(q, (; x, z))

and

foo(q, A x z)

to be two separate methods in the method table for foo. In other words, the second method doesn't just match the property names x and z, it also matches the type A.

However, I see now that the current tuple unpacking in positional arguments does not create a foo(::Tuple{Any, Any}) method, which is troubling. โ˜น๏ธ

julia> foo((x, y)) = x + y
foo (generic function with 1 method)

julia> methods(foo)
# 1 method for generic function "foo":
[1] foo(::Any) in Main at REPL[12]:1

You could imagine being able to do,

foo(q, (; x, z) :: A) = ...

which would be totally consistent with how you can currently do

julia> foo((x, y)::Tuple{Any,Any}) = x + y
foo (generic function with 1 method)

julia> methods(foo)
# 1 method for generic function "foo":
[1] foo(::Tuple{Any,Any}) in Main at REPL[1]:1
julia> foo((x, y)::Tuple{Any,Any}) = x + y
foo (generic function with 1 method)

Phew, I'm glad that exists as a workaround, but it still seems wrong to me that foo((x, y)) creates a foo(::Any) method.

In other words, foo(3) should throw a method error if the only method I've defined is foo((x, y)). Currently we get a bounds error instead:

julia> foo((x, y)) = x + y
foo (generic function with 1 method)

julia> foo(3)
ERROR: BoundsError: attempt to access Int64
  at index [2]
Stacktrace:
 [1] indexed_iterate(::Int64, ::Int64, ::Nothing) at ./tuple.jl:90
 [2] foo(::Int64) at ./REPL[1]:1
 [3] top-level scope at REPL[2]:1

I'm guessing the reason for this is because f((x,y))=... currently also allows being called with anything that implements the iterator interface and can unpack into two arguments, e.g. f([1,2]) also works, so you don't want to exclude that by requiring the argument be a Tuple in the signature.

Would unpack(nt::NamedTuple) that assigned (and could overwrite) symbols used as the names in nt to values given with Tuple(nt) be a viable approach? (I do not know how to tell rhs from lhs unless I have them both). If not, clarify this for me.

Bumping this old issue. Given that we have automatic keyword assignment as in f(; x, y) I think it is only logical to also also support (;x, y) = (y=1, x=2). One case where this would be particularly useful is in do blocks, e.g.

args = [(;x, y) for x in 1:4, y in 5:6]
map(args) do (;x, y)
    x + y
end

(I'm here because I had a bug in a more complicated version of the above, due to unpacking arguments in the incorrect order; this could have been avoided if the proposed syntax was available).

This is how it's done in JavaScript:

const nt = { a: 1, b: 2 }
const f = ({ b }) => b
f(nt)
2

Also:

const c = 3
const d = 4
{ c, d }
{ c: 3, d: 4 }

Worth adding, this also works for variable assignment (not in a function argument):

const nt = {a: 1, b:2}
const {b, a} = nt  // a = 1; b = 2

Yes, I'd like to reiterate that I'm particularly interested in being able to destructure (pattern match) a named tuple or a struct in a function argument, as was requested by OP. When you combine multiple dispatch with struct destructuring in function arguments, you get pretty close to the pattern matching capabilities of a language like Haskell.

It's also possible to destructure deeper objects and rename parameters.

const T = {
  a: 1,
  b: 2,
  c: { d: 3, e: 4 },
};

const f = ({ b: paramB, c: { d: paramD, e } }) => paramB * paramD * e;

console.log(f(T)); // 24

#39285 currently only implements the more minimal version of this proposal, but we could think about allowing renaming as well, so this example could be written as:

f((; b=paramB, c=(; d=paramD, e))) = paramB * paramD * e

This should even compose nicely when nested with regular destructuring for iterators, so this would already give us quite powerful pattern matching capabilities.

I think it would make more sense to put the new name on the LHS of = to be consistent with regular assignment and keywords.

f((; paramB=b, c=(; paramD=d, e))) = paramB * paramD * e
f((; paramB=b, c=(; paramD=d, e))) = paramB * paramD * e

Shouldn't this be

f((; paramB=b, (; paramD=d, e)=c)) = paramB * paramD * e

then instead? I initially thought of it that way as well, but it just seems very weird to have anything other than symbols as keys for kwarg syntax. I also think the other way makes a lot more sense, if you think about it in terms of pattern matching. Perhaps this confusion is indicative that we maybe should hold off on the more complicated cases for now though.

Yes I had the c in the wrong place, good catch. I did type that out on my phone though so I wouldn't take this error as evidence against the syntax.

I think if we were going to use the order, => might be more clear than =. DataFrames.jl does something like this with their combine function.

I initially thought renaming was unnecessary, but I then realized that without it, you wouldn't be able to apply the approach to e.g. a pair of namedtuples of the same type, as in

f((;a=>a1, b=>b1), (;a=>a2, b=>b2)) = (a=a1 + a2, b=b1-b2)
f((;a1=a, b1=b), (;a2=a, b2=b)) = (a=a1 + a2, b=b1-b2)

I imagine many cases where one would want to do this. So I think it is worth further consideration. On the other hand, if you think we could get this into 1.7, I would be all in favor of pushing forward with the fantastic improvement you've already implemented.

We can't really use => here, because that is just a function call, so something like this already has a meaning (although perhaps not super useful):

julia> (a => b,) = [42]
1-element Array{Int64,1}:
 42

julia> 1 => 2
42

What is this madness??! Can you point me to documentation that would help me understand this? I am completely at a loss.

It's the same as (=>)(a, b) = 42, so it's just a function definition, only in this case combined with argument destructuring.

Ahhhh, okay got it. Thanks!