@_ within the selector Fun part of DataFrames.select
ianqsong opened this issue · 6 comments
It seems currently @_ not working in the following:
using DataFrames
df = DataFrame(a=1:3, b=4:6, c=7:9);
df1 = select(df, Not(:b) => ByRow(@_(_1 + _2)))
MethodError: no method matching +(::var"#3#5", ::var"#4#6")
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:529
Stacktrace:
[1] top-level scope at In[18]:1
df2 = select(df, :a => @_(_ .^2))
MethodError: no method matching ^(::var"#20#21", ::Int64)
Closest candidates are:
^(!Matched::Float16, ::Integer) at math.jl:885
^(!Matched::Regex, ::Integer) at regex.jl:712
^(!Matched::Missing, ::Integer) at missing.jl:155
...
Stacktrace:
[1] macro expansion at ./none:0 [inlined]
[2] literal_pow at ./none:0 [inlined]
[3] _broadcast_getindex_evalf at ./broadcast.jl:631 [inlined]
[4] _broadcast_getindex at ./broadcast.jl:604 [inlined]
[5] getindex at ./broadcast.jl:564 [inlined]
[6] copy at ./broadcast.jl:830 [inlined]
[7] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(Base.literal_pow),Tuple{Base.RefValue{typeof(^)},Base.RefValue{var"#20#21"},Base.RefValue{Val{2}}}}) at ./broadcast.jl:820
[8] top-level scope at In[23]:1
@_
needs to be outside the function which accepts a function as an argument. Thus your first example should be:
@_ ByRow(_1 + _2) # equiv. ByRow(+), or ByRow((_1,_2) -> _1 + _2)
The second one is harder, I think what you want is select(df, @_ Pair(:a, _ .^2))
. It will work with the infix form =>
too, on version 2.0 of this package, but that is (probably) going to change, and will not work on master.
Exactly, thanks Michael.
The new DataFrames
API conventions use =>
heavily in a way which is very specialized to DataFrames
. This makes it a bit difficult to use a general purpose package like Underscores
nicely with the special purpose syntax in DataFrames
.
In the "True spirit of Underscores.jl", it's select
which is the higher order function accepting other functions, so in principle we'd like it if placing the @_
outside of select
worked (just like normal higher order functions):
df1 = @_ select(df, Not(:b) => ByRow(_1 + _2))
df2 = @_ select(df, :a =>_ .^2)
But unfortunately this doesn't seem possible(?) because select
doesn't actually accept plain functions; rather it accepts functions which are annotated with symbols (Not(:b)
, :a
), or further decorated with wrappers like ByRow
.
I do wonder whether there's any way to resolve this, or whether DataFrames syntax is just too specialized for this to work in a general way.
I suppose it wouldn't be impossible to special-case =>
for this purpose, on grounds that it's hard to imagine someone wanting the current behaviour:
@_ fun(x, :a =>_ .^2) # fun(x, ξ -> (:a => ξ.^2))
But it is one more rule to know, which makes @_
a bit more opaque. And it may introduce other weird edge cases that I haven't thought of.
xref #12 for thinking about the rules.
I suppose it wouldn't be impossible to special-case => for this purpose
Certainly, but I feel like there's cases where having a closure which returns a pair might be desired. I don't have a very natural example, but I feel like it would be weird if the following didn't work:
julia> @_ map(_=>_^2+1, 1:3) |> Dict
Dict{Int64,Int64} with 3 entries:
2 => 5
3 => 10
1 => 2
This isn't exactly the syntax above, and I suppose we could go to extra lengths to pattern match things which "happen to look like" DataFrames usage. But overall I feel like that will lead to trouble :)
Ah that's a good example, someone will do that for sure.
Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a @D_
with a few extra rules could be built on top of this package.
Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a
@D_
with a few extra rules could be built on top of this package.
I agree that's for the best.
I'd be quite happy to generalize the functions which do the Expr
manipulation a little, as required by DataFrames, so that DataFrames can reuse the work we've put into this package.