duncantl/CodeDepends

getInputs Incorrect for Loops with Conditionals

Closed this issue · 4 comments

Inputs are not detected correctly for loops that contain if-statements. Minimal example:

ex = quote(
  for (i in 1:10) {
      if (i == 2)
          a = 3
      b = a
  }
)

getInputs(ex)

Here a should be detected as an input, but isn't.

@nick-ulle a is only conditionally an input here. CodeDepends would need to know (or assume) that the condition in the if statement won't (always) be true to determine that a is an input.

BTW this doesn't have anything to do with the for loop, it's how if-statements are handled currently that is causing the issue.

> ex = quote(if(i==2)a=3)
> ex
if (i == 2) a = 3
> getInputs(ex)
An object of class "ScriptNodeInfo"
<snip>

Slot "inputs":
[1] "i"

Slot "outputs":
[1] "a"

<snip>

Slot "code":
if (i == 2) a = 3

Since a is an output of the if statement, it is available later within the {} block when b=a is processed in ex2 below, Thus not a global input to the codeblock.

> ex2 = quote({if(i==2) a=3; b=a})
> getInputs(ex2)
An object of class "ScriptNodeInfo"
<snip>

Slot "inputs":
[1] "i"

Slot "outputs":
[1] "a" "b"

Slot "updates":
character(0)

Slot "functions":
 { if == 
NA NA NA 

<snip>

Slot "code":
{
    if (i == 2) 
        a = 3
    b = a
}

BTW, shameless plug, my PR makes it very easy to experiment with different handlings of if calls, just need a custom handler. I'm still hoping to have it accepted.

Moving forward, I see two options. One is to implement the concept of conditional/possible outputs, and machinery to allow that to propogate properly. Difficult but most correct. The other is to decide on an assumption (conditional outputs are always outputs or never outputs) which causes us the least amount of pain based on what we're trying to do.

Best,
~G

The purpose of the for-loop is that a is not a conditional input (the first iteration) and this can be inferred statically. Although we could certainly unroll the loop to get a simpler example:

i = 1
if (i == 2)
  a = 3
b = a

i = 2
if (i == 2)
  a = 3
b = a

It's also not entirely clear to me why here i is marked as an output (which seems correct), whereas in the for-loop example, i is marked as an input.

At any rate, it seems like the conservative thing to do when handling inputs is to assume that no branches of a conditional will be visited and when handling outputs is to assume that all branches of a conditional will be visited. I might be wearing blinders, though, since I'm looking at this from a code optimization/compiler perspective.

Even better would be to track conditional inputs/outputs separately, so that information is preserved.

@nick-ulle you're absolutely right about the i handling, of course. I had missed that as I was focused on a and b.

I've pushed a fix to my repo (which should add it to the open PR) for recognizing that i is an output. I haven't done anything beyond that in terms of loop comprehension, as that is obviously much more in your wheelhouse than mine, but (with the PR) it should be very easy to build on what I put in and customize it further.

I've also added a (non-default) if handler that has the conservative, compiler-focused behavior you suggest CodeDepends:::ifforcomp. We can play with it and think about whether it makes sense as the default or not. Using that and the lop fix, we get your desired result, though again, that is because all conditional outputs are being ignored, not because it is smart enough to tell the condition is guaranteed to be false first.

> library(CodeDepends)
> ex = quote(for(i in 1:10) { if(i==2) a = 3; b=a})
> col = inputCollector("if" = CodeDepends:::ifforcomp)
> getInputs(ex, col)
An object of class "ScriptNodeInfo"
<snip>
Slot "inputs":
[1] "a"

Slot "outputs":
[1] "i" "b"

Slot "updates":
character(0)

Slot "functions":
for   :   {  if  == 
 NA  NA  NA  NA  NA 

<snip>

Slot "code":
for (i in 1:10) {
    if (i == 2) 
        a = 3
    b = a
}

Currently CodeDepends doesn't have a sort of value registry which tracks known values of symbols (those that were assigned constant values, with recursive propogation via direct assignment). I can see in my head how to do a first pass on that based on what is there now. I haven't put any of it down yet, mostly because I didn't want to charge ahead and stomp all over your toes research-wise.

@nick-ulle I'm going to close this. Please feel free to re-open it if appropriate.