matloff/TidyverseSkeptic

Pipe in tidyverse is a different concept from pipes in other contexts.

Opened this issue · 1 comments

tidyverse claims that the pipe operator is like the pipe in things like a POSIX command line: for example, ps | grep firefox. The idea of using a pipe in R originally stemmed from a desire to using something like a pipe in functional languages.

Piping on the command line attaches a stream, which is not the same as an object in R, between stdin and stdout of two programs, which is not the same as the input and output of a function. It is also involves forks, and thus there are differences in the order in which things are executed.

Piping in functional languages is syntactic rearrangement for changing a function call from f x y to y |> f x, for some function f with argument x and y. It fills the last argument of a function, and it relies on currying, which doesn't occur in R without additional packages.

These are both very different from what tidyverse calls a pipe, which is syntactic rearrangement for f(x, y) to x %>% f(y). Semantically, this pipe is identical to member access in object-oriented languages in the C++ vein, where member functions (aka methods) are part of the class (which is not the way R handles methods). For example, in C++, y.f(x) is implicitly calling f(y, x), where y is a pointer to the object.

One point about this, is that it is an example of the tidyverse authors ignoring existing knowledge in computer science and creating confusion by mixing up the terminology and concepts. They shouldn't have called them pipes.

Another point is that this implicit rearrangement of the function call is confusing.

Another point is that it's hard to justify that this syntax provides a benefit greater than the increase in complexity. It's very simple to use temporary variables.

I don't mind people referring to Tidy/Magrittr filters as pipes. But I agree there are major difference from those pipes and Unix shell pipes, in addition to the above. First, each stage of a shell pipe has a single input, which is sometimes not the case for Tidy, causing among other things the cognitive overload problem I cite in the essay. Second, with shell pipes, all of the stages are well known to the user, completely debugged and so on, which again is often not true for Tidy.