LucyMcGowan/tidycode

Functional programming: symbols vs function calls

tvatter opened this issue · 4 comments

Hi,

I'm using tidycode to analyze students' code, and I wondered about something when looking at what follows:

> "purrr::map_dbl(mtcars, mean)" %>%
  dance_recital() %>%
  unnest_calls(expr)
# A tibble: 1 x 7
  value      error  output    warnings  messages  func    args      
  <list>     <list> <list>    <list>    <list>    <chr>   <list>    
1 <dbl [11]> <NULL> <chr [1]> <chr [0]> <chr [0]> map_dbl <list [2]>

I guess that the behavior above (i.e., the call to mean is not detected) is closely related to the fact that getParseData(parse(text = "map_dbl(mtcars, mean)")) detects mean as a SYMBOL.

The annoying thing is that, by using functionals, students can "hide" function calls. For instance, if I tell them to create a my_factorial function that does not call R's factorial but rather computes the factorial recursively, they can "cheat" and simply do my_factorial <- function(x) purrr::map_dbl(x, factorial).

> body(my_factorial) %>% deparse() %>% dance_recital() %>% unnest_calls(expr)
# A tibble: 3 x 7
  value  error      output warnings messages func    args      
  <list> <list>     <list> <list>   <list>   <chr>   <list>    
1 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   ::      <list [2]>
2 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   purrr   <list [2]>
3 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   map_dbl <list [2]>

Right now, I prevent this by brute forcing the code analysis (i.e., I use stringr::str_detect ), but I find this solution somewhat unpleasant...

> body(my_factorial) %>% deparse() %>% stringr::str_detect("factorial")
[1] TRUE

Any idea?

@jtleek : I guess that students could also hide p-hacking from you this way :)

PS: Up to yesterday, I didn't know about tidycode and matahari, those tools are pretty cool!

Somewhat related, I wonder how one would analyze stuff like

myfunc <- purrr::compose(sqrt, abs)

One could debate about whether sqrt and abs are really executed, but if you think of a dummy implementation as shown below, I would definitely argue that they are.

mycompose <- function(f, g) {
  force(f)
  force(g)
  function(...) f(g(...))
}

Hmm this is tricky. A short term solution I could imagine is you could require your students to use the formula notation when using something like purrr::map_*(), so in this case, they'd be forced to do:

my_factorial <- function(x) purrr::map_dbl(x, ~factorial(.x))

you could check whether they actually implement this rule by checking the arguments of any map_* function and require that it include a call (and then you can examine what that call is). This doesn't solve the long term problem of how to deal with these issues outside the classroom, but maybe will help your specific use case?

Yeah, I thought of that already, but that's a bit unsatisfying, as factorial is arguably preferable to ~factorial(.x) when I'm trying teach the students to avoid things that are more complex that they need to be.

In general, I find it difficult to analyze stuff coded using functional style, as there is no straightforward way to analyze what's going on when functions are used as input/output of other functions rather than called directly.

If I were e.g. in C++, it would be straightforward given that function signatures would give you the needed information to parse/analyze code. In R, one would need a way to know which arguments are supposed to be functions, which is highly non-trivial... Still, if you think about it, the output of "purrr::map_dbl(mtcars, mean)" %>% dance_recital() %>% unnest_calls(expr) is suprisingly unhelpful given how infinitely complex things could be happening if I were to replace mean by an arbitrary function. Essentially, ANYTHING could be hidden behind something implemented using functional style.

Maybe it's the issue of matahari who isn't "spying properly" rather than tidycode that simply unwrap stuff captured by dance_recital ?

Yes, I definitely see your point re: factorial being preferable to ~factorial(.x). And I agree, this issue probably lies with the matahari package (you can submit an issue for that package here: https://github.com/jhudsl/matahari/issues). I am also an author there, so we may still end up in a convo about this, maybe the maintainer, Sean, will have some better ideas! I'm going to close this issue now, but feel free to reference it if you open on for matahari.