Function chaining
shelakel opened this issue ยท 234 comments
Would it be possible to allow calling any function on Any so that the value is passed to the function as the first parameter and the parameters passed to the function call on the value is added afterwards?
ex.
sum(a::Int, b::Int) -> a + b
a = 1
sum(1, 2) # = 3
a.sum(2) # = 3 or
1.sum(2) # = 3
Is it possible to indicate in a deterministic way what a function will return in order to avoid run time exceptions?
The .
syntax is very useful, so we aren't going to make it just a synonym for function call. I don't understand the advantage of 1.sum(2)
over sum(1,2)
. To me it seems to confuse things.
Is the question about exceptions a separate issue? i think the answer is no, aside from wrapping a function body in try..catch.
The 1.sum(2) example is trivial (I also prefer sum(1,2)) but it's just to demonstrate that a function isn't owned per se by that type ex. 1 can be passed to a function with the first parameter being a Real, not just to functions that expect the first parameter to be an Int.
Edit: I might have misunderstood your comment. Dot functions will be useful when applying certain design patterns such as the builder pattern commonly used for configuration. ex.
validate_for(name).required().gt(3)
# vs
gt(required(validate_for(name)), 3)
The exceptions I was just referring to is due to functions returning non-deterministic results (which is anyway bad practice). An example would be calling a.sum(2).sum(4) where .sum(2) sometimes return a String instead of an Int but .sum(4) expects an Int. I take it the compiler/runtime is already smart enough to evaluate such circumstances - which would be same when nesting the function sum(sum(1, 2), 4) - but the feature request would require extending said functionality to enforce type constraints on dot functions.
One of the use cases people seem to like is the "fluent interface". It's sometimes nice in OOP APIs when methods return the object, so you can do things like some_obj.move(4, 5).scale(10).display()
For me I think that this is better expressed as function composition, but the |>
doesn't work with arguments unless you use anon. functions, e.g. some_obj |> x -> move(x, 4, 5) |> x -> scale(x, 10) |> display
, which is pretty ugly.
One option to support this sort of thing would be if |>
shoved the LHS as the first argument to the RHS before evaluating, but then it couldn't be implemented as a simple function as it is now.
Another option would be some sort of @composed
macro that would add this sort of behavior to the following expression
You could also shift responsibility for supporting this to library designers, where they could define
function move(obj, x, y)
# move the object
end
move(x, y) = obj -> move(obj, x, y)
so when you don't supply an object it does partial function application (by returning a function of 1 argument) which you could then use inside a normal |>
chain.
Actually, the definition of |>
could probably be changed right now to the
behavior your asking for. I'd be for it.
On Monday, January 27, 2014, Spencer Russell notifications@github.com
wrote:
One of the use cases people seem to like is the "fluent interface". It's
sometimes nice in OOP APIs when methods return the object, so you can do
things like some_obj.move(4, 5).scale(10).display()For me I think that this is better expressed as function composition, but
the |> doesn't work with arguments unless you use anon. functions, e.g. some_obj
|> x -> move(x, 4, 5) |> x -> scale(x, 10) |> display, which is pretty
ugly.One option to support this sort of thing would be if |> shoved the LHS as
the first argument to the RHS before evaluating, but then it couldn't be
implemented as a simple function as it is now.Another option would be some sort of @Composed macro that would add this
sort of behavior to the following expressionYou could also shift responsibility for supporting this to library
designers, where they could definefunction move(obj, x, y)
# move the object
endmove(x, y) = obj -> move(obj, x, y)
so when you don't supply an object it does partial function application
(by returning a function of 1 argument) which you could then use inside a
normal |> chain.โ
Reply to this email directly or view it on GitHubhttps://github.com//issues/5571#issuecomment-33408448
.
ssfrr I like the way you think! I was unaware of the function composition |>
. I see there's recently been a similar discussion [https://github.com//issues/4963].
kmsquire I like the idea of extending the current function composition to allow you to specify parameters on the calling function ex. some_obj |> move(4, 5) |> scale(10) |> display
. Native support would mean one less closure, but what ssfrr suggested is a viable way for now and as an added benefit it should also be forward compatible with the extended function composition functionality if it gets implemented.
Thanks for the prompt responses :)
Actually, @ssfrr was correct--it isn't possible to implement this as a simple function.
What you want are threading macros (ex. http://clojuredocs.org/clojure_core/clojure.core/-%3E). Unfortunate that @-> @->> @-?>> is not viable syntax in Julia.
Yeah, I was thinking that infix macros would be a way to implement this. I'm not familiar enough with macros to know what the limitations are.
I think this works for @ssfrr's compose macro:
Edit: This might be a little clearer:
import Base.Meta.isexpr
_ispossiblefn(x) = isa(x, Symbol) || isexpr(x, :call)
function _compose(x)
if !isa(x, Expr)
x
elseif isexpr(x, :call) && #
x.args[1] == :(|>) && # check for `expr |> fn`
length(x.args) == 3 && # ==> (|>)(expr, fn)
_ispossiblefn(x.args[3]) #
f = _compose(x.args[3])
arg = _compose(x.args[2])
if isa(f, Symbol)
Expr(:call, f, arg)
else
insert!(f.args, 2, arg)
f
end
else
Expr(x.head, [_compose(y) for y in x.args]...)
end
end
macro compose(x)
_compose(x)
end
julia> macroexpand(:(@compose x |> f |> g(1) |> h('a',"B",d |> c(fred |> names))))
:(h(g(f(x),1),'a',"B",c(d,names(fred))))
If we're going to have this |>
syntax, I'd certainly be all for making it more useful than it is right now. Using just to allow putting the function to apply on the right instead of the left has always seemed like a colossal waste of syntax.
+1. It's especially important when you are using Julia for data analysis, where you commonly have data transformation pipelines. In particular, Pandas in Python is convenient to use because you can write things like df.groupby("something").aggregate(sum).std().reset_index(), which is a nightmare to write with the current |> syntax.
๐ for this.
(I'd already thought in suggesting the use of the ..
infix operator for this (obj..move(4,5)..scale(10)..display
), but the operator |>
will be nice too)
Another possibility is adding syntactic sugar for currying, like
f(a,~,b)
translating to x->f(a,x,b)
. Then |>
could keep its current meaning.
Oooh, that would be a really nice way to turn any expression into a function.
Possibly something like Clojure's anonymous function literals, where #(% + 5)
is shorthand for x -> x + 5
. This also generalizes to multiple arguments with %1, %2, etc. so #(myfunc(2, %1, 5, %2)
is shorthand for x, y -> myfunc(2, x, 5, y)
Aesthetically I don't think that syntax fits very well into otherwise very readable julia, but I like the general idea.
To use my example above (and switching to @malmaud's tilde instead of %), you could do
some_obj |> move(~, 4, 5) |> scale(~, 10) |> display
which looks pretty nice.
This is nice in that it doesn't give the first argument any special treatment. The downside is that used this way we're taking up a symbol.
Perhaps this is another place where you could use a macro, so the substitution only happens within the context of the macro.
We obviously can't do this with ~
since that's already a standard function in Julia. Scala does this with _
, which we could also do, but there's a significant problem with figuring out what part of the expression is the anonymous function. For example:
map(f(_,a), v)
Which one does this mean?
map(f(x->x,a), v)
map(x->f(x,a), v)
x->map(f(x,a), v)
They're all valid interpretations. I seem to recall that Scala uses the type signatures of functions to determine this, which strikes me as unfortunate since it means that you can't really parse Scala without knowing the types of everything. We don't want to do that (and couldn't even if we wanted to), so there has to be a purely syntactic rule to determine which meaning is intended.
Right, I see your point on the ambiguity of how far to go out. In Clojure the whole expression is wrapped in #(...)
so it's unambiguous.
In Julia is it idiomatic to use _ as don't-care value? Like x, _ = somfunc()
if somefunc
returns two values and you only want the first one?
To solve that I think we'd need macro with an interpolation-like usage:
some_obj |> @$(move($, 4, 5)) |> @$(scale($, 10)) |> display
but again, I think it's getting pretty noisy at that point, and I don't think that @$(move($, 4, 5))
gives us anything over the existing syntax x -> move(x, 4, 5)
, which is IMO both prettier and more explicit.
I think this would be a good application of an infix macro. As with #4498, if whatever rule defines functions as infix applied to macros as well, we could have a @->
or @|>
macro that would have the threading behavior.
Ya, I like the infix macro idea, although a new operator could just be introduced for this use in lieu of having a whole system for inplace macros. For example,
some_obj ||> move($,4,5) ||> scale($, 10) |> disp
or maybe just keep |>
but have a rule that
x |> f
implicitly transforms into x |> f($)
:
some_obj |> scale($,10) |> disp
Folks, it all really looks ugly: |> ||> etc.
So far I found out Julia's syntax to be so clear that these things discussed above doesn't look so pretty if compared to anything else.
In Scala it's probably the worst thing - they have so much operators like ::, :, <<, >> +:: and so on - it just makes any code ugly and not readable for one without a few months of experience in using the language.
Sorry to hear you don't like the proposals, Anton. It would be helpful if you made an alternative proposal.
Oh sorry, I am not trying to be unkind. And yes - critics without proposals
are useless.
Unfortunately I am not a scientist constructing languages so I just do not
know what to propose... well , except making methods optionally owned by
objects as it is in some languages.
I like the phrase "scientist constructing languages" - it sounds much more grandiose than numerical programmers sick of Matlab.
I feel that almost every language has a way to chain functions - either by repeated application of .
in OO languages, or special syntax just for that purpose in more functional languages (Haskell, Scala, Mathematica, etc.). Those latter languages also have special syntax for anonymous function arguments, but I don't think Julia is really going to go there.
I'll reiterate support for Spencer's proposal - x |> f(a)
get translated into f(x, a)
, very analogously to how do
blocks works (and it reinforces a common theme that the first argument of a function is privileged in Julia for syntactic sugar purposes). x |> f
is then seen as short-hand for x |> f()
. It's simple, doesn't introduce any new operators, handles the vast majority of cases that we want function chaining for, is backwards-compatible, and fits with existing Julia design principles.
I also think that is the best proposal here, main problem being that it seems to preclude defining |>
for things like I/O redirection or other custom purposes.
Just to note, .
is not a special function chaining syntax, but it happens to work that way if the function on the left returns the object it just modified, which is something that the library developer has to do intentionally.
Analogously, in Julia a library developer can already support chaining with |>
by defining their functions of N arguments to return a function of 1 argument when given N-1 arguments, as mentioned here
That would seem to cause problems if you want your function to support variable number of args, however, so having an operator that could perform the argument stuffing would be nice.
@JeffBezanson, it seems that this operator could be implemented if there was a way to do infix macros. Do you know if there's an ideological issue with that, or is just not implemented?
Recently, ~
was special-cased so that it quoted its arguments and calls
the macro @~
by default. |>
could be made to do the same thing.
Of course, in a few months, someone will ask for <|
to do the same...
On Thursday, February 6, 2014, Spencer Russell notifications@github.com
wrote:
Just to note, . is not a special function chaining syntax, but it happens
to work that way if the function on the left returns the object it just
modified, which is something that the library developer has to do
intentionally.Analogously, in Julia a library developer can already support chaining
with |> by defining their functions of N arguments to return a function
of 1 argument when given N-1 arguments, as mentioned herehttps://github.com//issues/5571#issuecomment-33408448That would seem to cause problems if you want your function to support
variable number of args, however, so having an operator that could perform
the argument stuffing would be nice.@JeffBezanson https://github.com/JeffBezanson, it seems that this
operator could be implemented if there was a way to do infix macros. Do you
know if there's an ideological issue with that, or is just not implemented?โ
Reply to this email directly or view it on GitHubhttps://github.com//issues/5571#issuecomment-34374347
.
right, I definitely wouldn't want this to be a special case. Handling it in your API design is actually not that bad, and even the variable arguments limitation isn't too much of an issue if you have type annotations to disambiguate.
function move(obj::MyType, x, y, args...)
# do stuff
obj
end
move(args...) = obj::MyType -> move(obj, args...)
I think this behavior could be handled by a @composable
macro that would handle the 2nd declaration.
The infix macro idea is attractive to me in the situation where it would be unified with declaring infix functions, which is discussed in #4498.
Why Julia creators are so much against allowing objects to contain their own methods? Where could I read more about that decision? Which thoughts and theory are behind that decision?
@meglio a more useful place for general questions is the mailing list or the StackOverflow julia-lang
tag. See Stefan's talk and the archives of the users and dev lists for previous discussions on this topic.
Just chiming in, to me the most intuitive thing is to have some placeholder be replaced by the
value of the previous expression in the sequence of things you're trying to compose, similar to clojure's as->
macro. So this:
@as _ begin
3+3
f(_,y)
g(_) * h(_,z)
end
would be expanded to:
g(f(3+3,y)) * h(f(3+3,y),z)
You can think of the expression on the previous line "dropping down" to fill the underscore hole on the next line.
I started sketching a tiny something like this last quarter in a bout of finals week procrastination.
We could also support a oneliner version using |>
:
@as _ 3+3 |> f(_,y) |> g(_) * h(_,z)
@porterjamesj, I like that idea!
I agree; that is pretty nice, and has an appealing generality.
On Feb 7, 2014 3:19 PM, "Kevin Squire" notifications@github.com wrote:
@porterjamesj https://github.com/porterjamesj, I like that idea!
Reply to this email directly or view it on GitHubhttps://github.com//issues/5571#issuecomment-34497703
.
I like @porterjamesj's idea not only because is a breath of fresh air, but because it seems much more flexible than previous ideas. We're not married to only using the first argument, we have free reign of the choice of intermediate variable, and this also seems like something that we can implement right now without having to add new syntax or special-cases to the language.
Note that in Julia, because we don't do much of the obj.method(args...)
pattern, and instead do the method(obj, args...)
pattern, we tend not to have methods that return the objects they operate on for the express purpose of method chaining. (Which is what jQuery
does, and is fantastic in javascript). So we don't save quite as much typing here, but for the purpose of having "pipes" setup between functions, I think this is really nice.
Given that clojure's ->
and ->>
are just special cases of the above, and fairly common, we could probably implement those pretty easily too. Although the question of what to call them is a bit tricky. Maybe @threadfirst
and @threadlast
?
I like the idea of this being a macro too.
Isn't it better if the expansion, following the example, is something like
tmp = 3+3; tmp = f(tmp); return h(tmp, z)
to avoid multiple calls to the same operation? (Maybe that was already implicit in @porterjamesj's idea)
Another suggestion: would it be possible that the macro expands the shortcuts f
to f(_)
and f(y)
to f(_,y)
? Maybe it will be too much, but I think that then we have an option to use placeholder only when needed... (the shortcuts must, however, be allowed only on alone function calls, not on expressions like the g(_) * h(_,z)
above)
@cdsousa the point about avoiding multiple calls is a good one. The clojure implementation uses sequential let bindings to achieve this; I'm not sure if we can get away with this though because I don't know enough about the performance of our let
.
So is the @as
macro using line breaks and =>
as split points to decide what's the substitution expression and what's getting substituted?
let
performance is good; now it can be as fast as a variable assignment when possible, and also pretty fast otherwise.
@ssfrr in my toy implementation is just filters out all the linebreak related nodes that the parser inserts (N.B., I don't really understand all these, it would probably be good to have documentation on them in the manual) and then reduces the substitution over the list of expressions that remains. Using let would be better though I think.
Another suggestion: would it be possible that the macro expands the shortcuts
f
tof(_)
andf(y)
tof(_,y)
f
to f(_)
makes sense to me. For the second, I'm of the opinion that explicitly specifying the location is better, since reasonable people could argue that either f(_,y)
or f(y,_)
is more natural.
Given that clojure's
->
and->>
are just special cases of the above, and fairly common, we could probably implement those pretty easily too. Although the question of what to call them is a bit tricky. Maybe@threadfirst
and@threadlast
?
I think specifying the location explicity with f(_,y...)
or f(y..., _)
allows the code to be quite understandable. While the extra syntax (and operators) make sense in Clojure, we don't really have additional operators available, and I think the additional macros would generally make the code less clear.
So is the
@as
macro using line breaks and=>
as split points to decide what's the substitution expression and what's getting substituted?
I would think it more natural to use |>
as a split point, since it is already used for pipelining
Just so you know, there's an implementation of the threading macro in Lazy.jl, which would lets you write, for example:
@>> range() map(x->x^2) filter(iseven)
On the plus side, it doesn't require any language changes, but it gets a bit ugly if you want to use more than one line.
I could also implement Lazy.jl now has an @as>
in Lazy.jl if there's interest.@as
macro, too.
You can also do something like this (though using a Haskell-like syntax) with Monads.jl (note: it needs to be updated to use current Julia syntax). But I suspect that a specialized version for just argument threading should be able to avoid the performance pitfalls the general approach has.
Lazy.jl looks like a very nice package, and actively maintained. Is there a compelling reason this needs to be in Base?
How will function chaining work with functions returning multiple values?
What would be the result of chaining eg.:
function foo(a,b)
a+b, a*b # x,y respectively
end
and bar(x,z,y) = x * z - y
be?
Wouldn't it require a syntax like bar(_1,z,_2)
?
Throwing in another example:
data = [2.255, 3.755, 6.888, 7.999, 9.001]
The clean way to write: log(sum(round(data)))
is data|>round|>sum|>log
But if we wanted to do a base 2 log, and wanted to round to 3 decimals,
then: we can only use the first form:
log(2,sum(round(data,3)))
But ideally we would like to be able to do:
data|>round(_,3)|>sum|>log(2,_)
(or similar)
I have made a prototype for how I suggest it should work.
https://github.com/oxinabox/Pipe.jl
It does not solve @gregid's point, but I am working on that now.
It also does not handle the need to expand the arguments
It is similar to @one-more-minute 's Lazy.jl threading macros but keeps the |>
symbol for readability (personal preference).
I'll slowly make it into a package, perhaps, at some point
One more option is:
data |> x -> round(x,2) |> sum |> x -> log(2,x)
Although longer than log(2,sum(round(data,2)))
this notation sometimes helps readability.
@shashi that is not bad, didn't think of that,
I think generally too verbose to be easily readable
https://github.com/oxinabox/Pipe.jl Now does solve @gregid's problem.
Though if you ask for both _[1]
and _[2]
it does this by making multiple calls to the subsitution
Which I am not certain is the most desirable behavour.
As an outsider, I think the pipeline operator would benefit from adapting F#'s treatment of it.
Granted, F# has currying, but some magic could perhaps be done on the back end to have it not require that. Like, in the implementation of the operator, and not the core language.
This would make [1:10] |> map(e -> e^2)
result in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
.
Looking back, @ssfrr alluded to this, but the obj
argument in their example would be automatically given to map
as the second argument in my example, thus saving programmers from having to define their functions to support it.
What do you propose that it mean?
On Jun 5, 2015, at 5:22 PM, H-225 notifications@github.com wrote:
As an outsider, I think one of the better ways to do this would be to adapt F#'s treatment of it.
Granted, F# has currying, but some magic could perhaps be done on the back end to have it not require that. Like, in the implementation of the operator, and not the core language.This would make [1:10] |> map(e -> e^2) result in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100].
Personally, I think that it nice and clear without being too verbose.
Obviously, one could write result = map(sqr, [1:10]), but they why have the pipeline operator at all?
Perhaps there is something I'm missing?โ
Reply to this email directly or view it on GitHub.
@StefanKarpinski
Basically, have the operator work like either:
x |> y(f)
=y(x, f)
x |> y(f)
=y(f, x)
Perhaps have an interface pattern that any function to be used with the operator takes the data to operate on as the either the first or last argument, depending on which of the above is selected to be that pattern.
So, for the map
function as an example, map
would either be map(func, data)
or map(data, func)
.
Is that any clearer?
Lazy.jl looks like a very nice package, and actively maintained. Is there a compelling reason this needs to be in Base?
I think this is the important question here.
The reason this may be desirable in base is 2 fold:
1.) We may want to encourage pipelining as being the Julian Way -- arguments can be made that it is more readable
2.) things like Lazy.jl, FunctionalData.jl, and my own Pipe.jl require a macro to wrap the expression it is to act on -- which makes it less readable.
I feel the answer may lay in having Infix Macros.
And defining |> as such.
I'm not certain having |>, (or their cousin the do block) belong in core at all.
But the tools don't exist to define them outside of the parser.
The ability to have that sort of pipelining syntax seems very nice. Could just that be added to Base, i.e. x |> y(f) = y(f, x)
part, that Lazy.j, FunctionalData.jl, and Pipe.jl could use? ๐
Having looked at code that uses the various implementations of this out in packages, I personally find it unreadable and very much un-Julian. The left-to-right pipeline pun doesn't help readability, it just makes your code stand out as backwards from the rest of the perfectly normal code that uses parentheses for function evaluation. I'd rather discourage a syntax that leads to 2 different styles where code written in either style looks inside-out and backwards relative to code written in the other. Why not just settle on the perfectly good syntax we already have and encourage making things look more uniform?
@tkelman
Personally, I see it from a somewhat utilitarian point of view.
Granted, maybe if you're doing something simple then it isn't necessary, but if you're writing a function say, that does something fairly complicated, or long winded, (off the top of my head: data manipulation e.g.), then I think that's where pipeline syntax shines.
I understand what you mean though; it would be more uniform if you had one function call syntax for everything. Personally though, I think it's better to make it easier to write [complicated] code that can be easily understood. Granted, you have to learn the syntax and what it means, but, IMHO, |>
is no harder to grasp than how to call a function.
@tkelman I'd look at it from a different point of view. Obviously, there are people who prefer that style of programming. I can see that maybe you'd want to have a consistent style for the source code to Base, but this is only about added the parser support for their preferred style of programming their Julia applications. Do julians really want to try to dictate or otherwise stifle something other people find beneficial?
I've found pipelining stuff together very useful in Unix, so even though I've never used a programming language that enabled it in the language, I'd at least give it the benefit of the doubt.
We do have |>
as a function piping operator, but there are implementation limitations to how it's currently done that make it pretty slow at the moment.
Piping is great in a unix shell where everything takes text in and text out. With more complicated types and multiple inputs and outputs, it's not as clear-cut. So we have two syntaxes, but one makes a lot less sense in the MIMO case. Parser support for alternate styles of programming or DSL's is not usually necessary since we have powerful macros.
OK, thanks, I was going by @oxinabox's comment:
But the tools don't exist to define them outside of the parser.
Is it understood what would be done to remove the implementation limitations you refered to?
Some of the earlier suggestions could potentially be implemented by making |>
parse its arguments as a macro instead of as a function. The former command-object piping meaning of |>
has been deprecated, so this might actually be freed up to do something different with, come 0.5-dev.
However this choice reminds me quite a bit of the special parsing of ~
which I feel is a mistake for reasons I've stated elsewhere.
Parsing ~ is just insane, it's a function in base. Using _
, _1
, _2
, seem more reasonable (esp. if you raise if these variables are defined elsewhere in scope). Still until we have more efficient anonymous functions this seems like it's not going to work...
implemented by making |> parse its arguments as a macro instead of as a function
Unless you do that!
Parsing ~ is just insane, it's a function in base
It's a unary operator for the bitwise version. Infix binary ~
parses as a macro, ref #4882, which I think is a strange use of an ascii operator (#11102 (comment)).
So we have two syntaxes, but one makes a lot less sense in the MIMO case.
3 Syntaxes. Kind of.
Pipe in, Normal function call and Do-blocks.
Debatable even 4, since Macros use a different convention as well.
For me,
the Readorder (ie left to right) == Application order, makes, for SISO function chains, a lot clearer.
I do a lot of code like (Using iterators.jl, and pipe.jl):
loaddata(filename) |> filter(s-> 2<=length(s)<=15, _) |> take!(150,_) |> map(eval_embedding, _)
results |> get_error_rate(desired_results, _) |> round(_,2)
For SISO, it;s better (for my personal preference), for MIMO it is not.
Julia seems to have already settled towards there being multiple correct ways to do things.
Which I am not 100% sure is a good thing.
As I said I would kind of like Pipe and Do blocks moved out of the main language.
Do-blocks have quite a few very helpful use cases, but it has annoyed me a little that they have to use the first input as the function, doesn't always fit in quite right with the multiple dispatch philosophy (and neither would pandas/D style UFCS with postfix data.map(f).sum()
, I know it's popular but I don't think it can be combined effectively with multiple dispatch).
Piping can probably be deprecated quite soon, and left to packages to use in DSL's like your Pipe.jl.
Julia seems to have already settled towards there being multiple correct ways to do things.
Which I am not 100% sure is a good thing.
It's related to the question of whether or not we can rigorously enforce a community-wide style guide. So far we haven't done much here, but for long-term package interoperability, consistency, and readability I think this will become increasingly important as the community grows. If you're the only person who will ever read your code, go nuts and do whatever you want. If not though, there's value in trading off slightly worse (in your own opinion) readability for the sake of uniformity.
@tkelman @oxinabox
I have yet to find a clear reason why it should not be included in the language, or indeed in the "core" packages. [e.g: Base]
Personally, I think making |>
a macro might be the answer.
Something like this perhaps? (I'm not a master Julia programmer!)
macro (|>) (x, y::Union(Symbol, Expr))
if isa(y, Symbol)
y = Expr(:call, y) # assumes y is callable
end
push!(y.args, x)
return eval(y)
end
Under Julia v0.3.9, I was unable to define it twice -- once with a symbol, and once with an expression; my [limited] understanding of Union
is that there is performance hit from using it, so I'm guessing that would be something to rectify in my toy example code.
Of course, there is a problem with the use syntax for this.
For example, to run the equivalent of log(2, 10)
, you have to write @|> 10 log(2)
, which isn't desirable here.
My understanding is that you'd have to be able to somehow mark functions/macros as "infixable", as it were, such that you could then write it thus: 10 |> log(2)
. (Correct if wrong!)
Contrived example, I know. I can't think of a good one right now! =)
It's also worth pointing out one area I have not covered in my example...
So e.g:
julia> for e in ([1:10], [11:20] |> zip) println(e) end
(1,11)
(2,12)
(3,13)
(4,14)
(5,15)
(6,16)
(7,17)
(8,18)
(9,19)
(10,20)
Again - contrived example, but hopefully you get the point!
I did some fiddling, but as of writing this I was unable to fathom how to implement that, myself.
Please see #554 (comment) and #11608.
On Jun 9, 2015, at 9:37 PM, H-225 notifications@github.com wrote:
I have yet to find a clear reason why it should not be included in the language
This is the wrong mental stance for programming language design. The question must by "why?" rather than "why not?" Every feature needs a compelling reason for its inclusion, and even with a good reason, you should think long and hard before adding anything. Can you live without it? Is there a different way to accomplish the same thing? Is there a different variation of the feature that would be better and more general or more orthogonal to the existing features? I'm not saying this particular idea couldn't happen, but there needs to be a far better justification than "why not?" with a few examples that are no better than the normal syntax.
The question must by "why?" rather than "why not?"
+1_000_000
Indeed.
See this fairly well known blog post:
Every feature starts with -100 points.
It needs to make a big improvement to be worth adding to the language.
FWIW, Pyret (http://www.pyret.org/) went through this exact discussion a few months ago. The language supports a "cannonball" notation which originally functioned much the way that people are proposing with |>
. In Pyret,
[list: 1, 2, 3, 5] ^ map(add-one) ^ filter(is-prime) ^ sum() ^ ...
So, the cannonball notation desugared into adding arguments to the functions.
It didn't take long before they decided that this syntax was too confusing. Why is sum()
being called without any arguments? etc. Ultimately, they opted for an elegant currying alternative:
[list: 1, 2, 3, 5] ^ map(_, add-one) ^ filter(_, is-prime) ^ sum() ^ ...
This has the advantage of being more explicit and simplifies the ^
operator to a simple function.
Yes, that seems much more reasonable to me. It is also more flexible than currying.
@StefanKarpinski I'm a little confused. Did you mean to say more flexible then chaining (not currying)? After all Pyret's solution was to simply use currying, which is more general than chaining.
Maybe, if we modify the |>
syntax a little bit (I really don't know how hard it is to implement, maybe it conflicts with |
and >
), we could set something flexible and readable.
Defining something like
foo(x,y) = (y,x)
bar(x,y) = x*y
We would have:
randint(10) |_> log(_,2) |> sum
(1,2) |_,x> foo(_,x) |x,_> bar(_,2) |_> round(_, 2) |> sum |_> log(_, 2)
In other words, we would have an operator like |a,b,c,d>
where a
, b
, c
and d
would get the returned values of the last expression (in order) and use it in placeholders inside the next one.
If there are no variables inside |>
it would work as it works now. We could also set a new stardard: f(x) |> g(_, 1)
would get all values returned by f(x)
and associate with the _
placeholder.
@samuela, what I meant was that with currying you can only omit trailing arguments, whereas with the _
approach, you can omit any arguments and get an anonymous function. I.e. given f(x,y)
with currying you can do f(x)
to get a function that does y -> f(x,y)
, but with underscores you can do f(x,_)
for the same thing but also do f(_,y)
to get x -> f(x,y)
.
While I like the underscore syntax, I'm still not satisfied with any proposed answer to the question of how much of the surrounding expression it "captures".
what do you do if a function returns multiple results? Would it have to pass a tuple to the _ position? Or could there be a syntax to split it up on the fly? May be a stupid question, if so, pardon!
@StefanKarpinski Ah, I see what you mean. Agreed.
@ScottPJones the obvious answer is to allow ASCII art arrows:
http://scrambledeggsontoast.github.io/2014/09/28/needle-announce/
@simonbyrne That looks even worse than programming in Fortran IV on punched cards, like I did in my misspent youth! Just wondered if some syntax like _1, _2, etc. might allow pulling apart a multiple return, or is that just a stupid idea on my part?
@simonbyrne That's brilliant. Implementing that as a string macro would be an amazing GSoC project.
Why is sum() being called without any arguments?
I think that the implicit argument is also one of the more confusing things about do
notation, so it would be nice if we could utilise the same convention for that as well (though I realise that it is much more difficult, as it is already baked into the language).
@simonbyrne You don't think it could be done in an unambiguous way? If so, that's something I think is worth breaking (the current do
notation), if it can be made more logical, more general, and consistent with chaining.
@simonbyrne Yeah, I totally agree. I understand the motivation for the current do
notation but I feel strongly that it doesn't justify the syntactical gymnastics.
@samuela regarding map(f, _) vs just map(f). I agree that some magic desugaring would be confusing, but I do think map(f) is something that should exist. It wouldn't require and sugar just add a simple method to map.
eg
map(f::Base.Callable) = function(x::Any...) map(f,x...) end
i.e. map takes a function and then returns a function that works on things that are iterable (more or less).
More generally I think we should lean towards functions that have additional "convenience" methods, rather than some sort of convention that |>
always maps data to the first argument (or similar).
In the same vein there could be a
type Underscore end
_ = Underscore()
and a general convention that functions should/could have methods that take underscores in certain arguments, and then return functions that take fewer arguments. I'm less convinced that this would be a good idea, as one would need to add 2^n methods for each function that takes n arguments. But it's one approach. I wonder if it would be possible to not have to explicitly add so many methods but rather hook into the method look up, so that if any arguments are of type Underscore then the appropriate function is returned.
Anyway, I definitely think having a version of map and filter that just take a callable and return a callable makes sense, the thing with the Underscore may or may not be workable.
@patrickthebold
I would imagine that x |> map(f, _)
=> x |> map(f, Underscore())
=> x |> map(f, x)
โ, as you propose, would be the simplest way to implement map(f, _)
, right? - just have _
be a special entity which you'd program for?
โโ
Though, I'm uncertain if that would be better than having it automatically inferred by Julia-- presumably using the |>
syntax-- rather than having to program it yourself.
Also, regarding your proposal for map
- I kinda like it. Indeed, for the current |>
that would be quite handy. Though, I imagine it would be simpler better to just implement automatic inferencing of x |> map(f, _)
=> x |> map(f, x)
instead?
@StefanKarpinski Makes sense. Hadn't thought of it quite like that.
Nothing I said would be tied to |>
in any way. What I meant regarding the _
would be for example to add methods to <
as such:
<(_::Underscore, x) = function(z) z < x end
<(x, _::Underscore) = function(z) x < z end
But again I think this would be a pain unless there was a way to automatically add the appropriate methods.
Again, the thing with the underscores is separate that adding the convenience method to map as outlined above. I do think both should exist, in some form or another.
@patrickthebold Such an approach with a user-defined type for underscore, etc would place a significant and unnecessary burden on the programmer when implementing functions. Having to list out all 2^n of
f(_, x, y) = ...
f(x, _, y) = ...
f(_, _, y) = ...
...
would be very annoying, not to mention inelegant.
Also, your proposition with map
would I suppose provide a workaround syntax for map(f)
with basic functions like map
and filter
but in general it suffers from the same complexity issue as the manual underscore approach. For example, for func_that_has_a_lot_of_args(a, b, c, d, e)
you'd have to go through the grueling process of typing out each possible "currying"
func_that_has_a_lot_of_args(a, b, c, d, e) = ...
func_that_has_a_lot_of_args(b, c, d, e) = ...
func_that_has_a_lot_of_args(a, b, e) = ...
func_that_has_a_lot_of_args(b, d, e) = ...
func_that_has_a_lot_of_args(a, d) = ...
...
And even if you did, you'd still be faced with an absurd amount of ambiguity when calling the function: Does func_that_has_a_lot_of_args(x, y, z)
refer to the definition where x=a,y=b,z=c
or x=b,y=d,z=e
, etc? Julia would discern between them with runtime type information but for the lay-programmer reading the source code it would be totally unclear.
I think the best way to get underscore currying done right is to simply incorporate it into the language. It would be a very straightforward change to the compiler after all. Whenever an underscore appears in a function application, just pull it out to create a lambda. I started looking into implementing this a few weeks ago but unfortunately I don't think I'll have enough free time in the next few weeks to see it through. For someone familiar with the Julia compiler though it would probably take no more than an afternoon to get things working.
@samuela
Can you clarify what you mean by, "pull it out to create a lambda"? - I'm curious. I too have wondered how that may be implemented.
@patrickthebold
Ah - I see. Presumably you could then use such a thing like this: filter(_ < 5, [1:10])
=> [1:4]
?
Personally, I would find filter(e -> e < 5, [1:10])
easier to read; more consistent - less hidden meaning, though I grant you, it is more concise.
Unless you have an example where it really shines?
Also, your proposition with map would I suppose provide a workaround syntax for map(f) with basic functions like map and filter but in general it suffers from the same complexity issue as the manual underscore approach.
I wasn't suggesting that this be done in general, only for map
and filter
, and possibly a few other places where it seems obvious. To me, that's how map
should work: take in a function and return a function. (pretty sure that's what Haskell does.)
would be very annoying, not to mention inelegant.
I think we are in agreement on that. I'd hope there would be a way to add something to the language to handle method invocations where some arguments are of type Underscore. Upon further thought, I think it boils down to having a special character automatically expand into a lambda, or have a special type that automatically expands into a lambda. I don't feel strongly either way. I can see pluses and minuses to both approaches.
@H-225 yes the underscore thing is just a syntactic convenience. Not sure how common it is, but Scala certainly has it. Personally I like it, but I think it's just one of those style things.
@H-225 Well, in this case I think a compelling and relevant example would be function chaining. Instead of having to write
[1, 2, 3, 5]
|> x -> map(addone, x)
|> x -> filter(isprime, x)
|> sum
|> x -> 3 * x
|> ...
one could simply write
[1, 2, 3, 5]
|> map(addone, _)
|> filter(isprime, _)
|> sum
|> 3 * _
|> ...
I find myself unknowingly using this underscore syntax (or some slight variant) constantly in languages that support it and only realize how helpful it is when transitioning to work in languages that do not support it.
As far as I know, there are currently at least 3.5 libraries/approaches that attempt to address this problem in Julia: Julia's builtin |>
function, Pipe.jl, Lazy.jl, and 0.5 for Julia's builtin do
notation which is similar in spirit. Not to bash any of these libraries or approaches, but many of them could be greatly simplified if underscore currying was supported by Julia.
@samuela if you'd like to play with an implementation of this idea, you could try out FunctionalData.jl, where your example would look like this:
@p map [1,2,3,4] addone | filter isprime | sum | times 3 _
The last part shows how to pipe the input into the second parameter (default is argument one, in which case the _
can be omitted). Feedback very much appreciated!
Edit: the above is simply rewritten to:
times(3, sum(filter(map([1,2,3,4],addone), isprime)))
which uses FunctionalData.map and filter instead of Base.map and filter. Main difference is the argument order, second difference is the indexing convention (see docs). In any case, Base.map can simply be used by reversing the argument order. @p
is quite a simple rewrite rule (left to right becomes inner-to-outer, plus support for simple currying: @p map data add 10 | showall
becomes
showall(map(data, x->add(x,10)))
Hack may introduce something like this: facebook/hhvm#6455. They're using $$
which is off the table for Julia ($
is already too overloaded).
FWIW, I really like Hack's solution to this.
I like it too, my main reservation being that I'd still kind of like a terser lambda notation that might use _
for variables / slots and it would be good to make sure that these don't conflict.
Couldn't one use __
? What's the lambda syntax you're thinking of? _ -> sqrt(_)
?
Sure, we could. That syntax already works, it's more about a syntax that doesn't require the arrow, so that you can write something along the lines of map(_ + 2, v)
, the real issue being how much of the surrounding expression the _
belongs to.
Doesn't Mathematica have a similar system for anonymous arguments? How do
they handle the scope of the bounding of those arguments?
On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com
wrote:
Sure, we could. That syntax already works, it's more about a syntax that
doesn't require the arrow, so that you can write something along the lines
of map(_ + 2, v), the real issue being how much of the surrounding
expression the _ belongs to.โ
Reply to this email directly or view it on GitHub
#5571 (comment).
https://reference.wolfram.com/language/tutorial/PureFunctions.html, showing
the # symbol, is what I was thinking of.
On Tue, Nov 3, 2015 at 9:34 AM Jonathan Malmaud malmaud@gmail.com wrote:
Doesn't Mathematica have a similar system for anonymous arguments? How do
they handle the scope of the bounding of those arguments?
On Tue, Nov 3, 2015 at 9:09 AM Stefan Karpinski notifications@github.com
wrote:Sure, we could. That syntax already works, it's more about a syntax that
doesn't require the arrow, so that you can write something along the lines
of map(_ + 2, v), the real issue being how much of the surrounding
expression the _ belongs to.โ
Reply to this email directly or view it on GitHub
#5571 (comment).
Mathematica uses &
to delimit it.
Rather than doing something as general as a shorter lambda syntax (which could take an arbitrary expression and return an anonymous function) we could get around the delimiter problem by confining the acceptable expressions to function calls, and the acceptable variables / slots to entire parameters. This would give us a very clean multi-parameter currying syntax ร la Open Dyln. Because the _
replaces entire parameters, the syntax could be minimal, intuitive, and unambiguous. map(_ + 2, _)
would translate to x -> map(y -> y + 2, x)
. Most non-function call expressions that you would want to lambdafy would probably be longer and more amiable to ->
or do
anyway. I do think the trade-off of usability vs generality would be worth it.
@durcan, that sounds promising โ can you elaborate on the rule a bit? Why does the first _
stay inside the argument of map
while the second one consumes the whole map
expression? I'm not clear on what "confining the acceptable expressions to function calls" means, nor what "confining acceptable variables / slots to entire parameters" means...
Ok, I think I get the rule, having read some of that Dylan documentation, but I have to wonder about having map(_ + 2, v)
work but map(2*_ + 2, v)
not work.