c42f/Underscores.jl

@_ within the selector Fun part of DataFrames.select

ianqsong opened this issue · 6 comments

It seems currently @_ not working in the following:

using DataFrames
df = DataFrame(a=1:3, b=4:6, c=7:9);

df1 = select(df, Not(:b) => ByRow(@_(_1 + _2)))

MethodError: no method matching +(::var"#3#5", ::var"#4#6")
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:529
Stacktrace:
 [1] top-level scope at In[18]:1

df2 = select(df, :a => @_(_ .^2))

MethodError: no method matching ^(::var"#20#21", ::Int64)
Closest candidates are:
  ^(!Matched::Float16, ::Integer) at math.jl:885
  ^(!Matched::Regex, ::Integer) at regex.jl:712
  ^(!Matched::Missing, ::Integer) at missing.jl:155
  ...
Stacktrace:
 [1] macro expansion at ./none:0 [inlined]
 [2] literal_pow at ./none:0 [inlined]
 [3] _broadcast_getindex_evalf at ./broadcast.jl:631 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:604 [inlined]
 [5] getindex at ./broadcast.jl:564 [inlined]
 [6] copy at ./broadcast.jl:830 [inlined]
 [7] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(Base.literal_pow),Tuple{Base.RefValue{typeof(^)},Base.RefValue{var"#20#21"},Base.RefValue{Val{2}}}}) at ./broadcast.jl:820
 [8] top-level scope at In[23]:1

@_ needs to be outside the function which accepts a function as an argument. Thus your first example should be:

@_ ByRow(_1 + _2)  # equiv. ByRow(+), or ByRow((_1,_2) -> _1 + _2)

The second one is harder, I think what you want is select(df, @_ Pair(:a, _ .^2)). It will work with the infix form => too, on version 2.0 of this package, but that is (probably) going to change, and will not work on master.

c42f commented

Exactly, thanks Michael.

The new DataFrames API conventions use => heavily in a way which is very specialized to DataFrames. This makes it a bit difficult to use a general purpose package like Underscores nicely with the special purpose syntax in DataFrames.

In the "True spirit of Underscores.jl", it's select which is the higher order function accepting other functions, so in principle we'd like it if placing the @_ outside of select worked (just like normal higher order functions):

df1 = @_ select(df, Not(:b) => ByRow(_1 + _2))
df2 = @_ select(df, :a =>_ .^2)

But unfortunately this doesn't seem possible(?) because select doesn't actually accept plain functions; rather it accepts functions which are annotated with symbols (Not(:b), :a), or further decorated with wrappers like ByRow.

I do wonder whether there's any way to resolve this, or whether DataFrames syntax is just too specialized for this to work in a general way.

I suppose it wouldn't be impossible to special-case => for this purpose, on grounds that it's hard to imagine someone wanting the current behaviour:

@_ fun(x, :a =>_ .^2) # fun(x, ξ -> (:a => ξ.^2))

But it is one more rule to know, which makes @_ a bit more opaque. And it may introduce other weird edge cases that I haven't thought of.

xref #12 for thinking about the rules.

c42f commented

I suppose it wouldn't be impossible to special-case => for this purpose

Certainly, but I feel like there's cases where having a closure which returns a pair might be desired. I don't have a very natural example, but I feel like it would be weird if the following didn't work:

julia> @_ map(_=>_^2+1, 1:3) |> Dict
Dict{Int64,Int64} with 3 entries:
  2 => 5
  3 => 10
  1 => 2

This isn't exactly the syntax above, and I suppose we could go to extra lengths to pattern match things which "happen to look like" DataFrames usage. But overall I feel like that will lead to trouble :)

Ah that's a good example, someone will do that for sure.

Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a @D_ with a few extra rules could be built on top of this package.

c42f commented

Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a @D_ with a few extra rules could be built on top of this package.

I agree that's for the best.

I'd be quite happy to generalize the functions which do the Expr manipulation a little, as required by DataFrames, so that DataFrames can reuse the work we've put into this package.