matloff/TidyverseSkeptic

About tidyverse being easier to teach and learn.

karoliskoncevicius opened this issue ยท 4 comments

Hello, nice write-up.

I am also a bit of a skeptic when it comes to the tidyverse but just wanted to offer an opinion about why people say that learning tidyverse is easier.

Basically from what I saw when introducing people to R is that a lot of resources aimed at beginners are tidyverse-specific. Most of the tutorials online, most of the videos on sites like youtube, and most of new or popular books, like "R for Data Science", use tidyverse explicitly.

This might be a product of the marketing team within R-Studio, but anyway - that's how it is. Nobody that I know of writes books for beginners in base R because there is no incentive for doing so. Your book for example - "Art of R programming", which in my opinion is excellent, doesn't appeal to new (certain kind of) people as it doesn't get them from zero to fancy plot in one lesson.

So in short - It's easier to learn tidyverse not so much because of ease of use, but because, compared with base R, it has more teaching material available.

I have a tutorial, fasteR, available, https://tinyurl.com/y48bcfxv. I would claim it gets the beginner to useful R quite quickly, using just base-R. But of course your point is correct.

I think R has more appeal to people with mathematical background due to its logic and syntax. Years ago, while being unaware of the existence of R, I found a code snippet which, upon reading, I understood almost everything it was doing. I was hooked.

My two beginner's books written in base R were authored by Dr. Mark Gardner - Beginning R The Statistical Programming Language and by Larry Pace - Beginning R An Introduction To Statistical Programming. In my opinion one should not need ample literature to get started with base R.

Beginners should be given the opportunity to compare between paradigms and be able to make informed decisions. I find this most important when statistical programming is involved. Code should be concise, effective and efficient without obscuring the path to mathematical outcome.

These books have done enough to open my way to statistical programming as I was coming from Physics background with work experience in design of controlled experiments for manufacturing.

Reading documentation for plyr, dplyr, tidy the feeling of "OK, OK, but why?" was lurking in the back of my mind, until data.table took it away.

Later, while writing R extensions I appreciated the powerful simplicity of base R.

In my book, "Learn R: As a Language" (published last year), I have tried to present the tidyverse as an alternative, and only after the chapters on base R. It goes against the current in this and in its depth. For me reading and rereading "Art of R programming" was the key to understanding the R language years ago. My perspective as a teacher is that of teaching R to Biology, Ecology and Agronomy students, mostly for programming in the small. Whatever has been said for years, R is not a difficult language to learn or teach, because the payback to the effort is ample if learnt well.

I think one side benefit from the tidyverse and its performance claims has been the effort put in improving the performance of base R, which now in many cases outperforms the tidyverse. Yes, the "OK, OK, but why?" feeling describes well many of the twists of the garmmar, although it is also true that one finds useful functions here and there in the tidyverse. The other question is that base R is stable and the tidyverse is anything but stable, and things like examples that no longer work or that work for one student and not for another can be very disconcerting and disturbing in class. In my own work I tend to use the tidyverse very selectively, and mostly teach base R to students. The grammar of graphics is a different story to the rest of the tidyverse, I think.

@aphalo :

I write my own "helpful functions" if I can help it. I use R packages very cautiously and minimalistic, and if I have the chance of choosing between two solutions I'd pick the simpler, older and more troubleshootable one, and for testing I prefer checkmate. I wonder if I could release packages without the need for testthat. I haven't tried so far.

Lately I have observed a curious trend in calling a large number of packages and picking one function from each when composing a relatively simple script with a relatively trivial outcome. I guess this behavior is also encouraged by the use of *verses in general.

Regarding stability, I think you are right. This was one of the reasons (there are a few more) that lead me to drop Python after one year: compatibility between package versions and the need of bloated *conda to mitigate this. Plus, to me, Python - like tidyverse et al. - seems to being pushed, not pulled.

I don't think is anyone out there that couldn't find ggplot2 and its extensions useful. I have considered it a blessing given the lattice alternative.

Any way, so far, I haven't been put in the situation to stop and say "wait, I need tidyverse for that".

I sincerely hope it will never happen!