yonicd/sinew

Is there a way to ignore internally nested functions?

njtierney opened this issue · 3 comments

Thanks for this package, @yonicd ! I had already written a bit of code using XML to do something similar, really glad this exists.

I was wondering if there is a way to tell untangle to ignore nested functions? For example:

# The two (baseline) generation intervals of Ganyani et al.
ganyani_cdf <- function(which) {

  gi_params <- ganyani_gi(which)

  beta <- gi_params$mean$est / (gi_params$sd$est ^ 2)
  alpha <- gi_params$mean$est * beta

  gi_cdf <- function(days) {
    pgamma(days, alpha, beta)
  }

  gi_cdf

}

Calling untangle on this creates two files:

  1. ganyani_cdf.R
  2. gi_cdf.R

I think it might be possible within XML to identify ascendents/descents, but not sure about the Parsedata approach you are using.

Cheers!

Hi @njtierney. An xml approach is cool. There could be motivation to import {xmlparsedata} for such problems. But, if that package can infer hierarchy from the parsedata I’m guessing it is possible to solve this issue within the confines of this function.

I’ll play a bit with the example you wrote. Is the problem that the nested function have to be in the environment of the parent to work in this case?

Not sure what the default would be in such cases, since the intent is to disentangle all functions from each other.

Another solution could be be to have the user set an ignore list (similar to how I handled things in pretty_namespace), but that can be onerous.

so there is a simple way to add a nested check into untangle as it is.

untangle locates the rows that each function is defined on. I assumed the lines wouldnt intersect and did create the contigency for it.

but in your usecase they do. A solution would be to check if the text elements overlap. if one of the list objects fully overlap with another then it would be flagged as a nested function and ignored.

adding a breakpoint here on your example when untangling:

p.split
[[1]]
[[1]]$name
[1] "ganyani_cdf"

[[1]]$text
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14


[[2]]
[[2]]$name
[1] "gi_cdf"

[[2]]$text
[1]  8  9 10

this is the solution i got to. any suggestions to make this more efficient is welcome

set.seed(123)
x <- 1:30
n <- sample(2:5,1)
s <- split(x, sort(rep_len(1:n, length(x))))
f <- function(x,l) seq.int(x, x + l)
s_sub <- lapply(c(s,s),function(x){
  f(x[sample(seq(floor(length(x)/2)),1)],sample(2:4,1))
})

s_bind <- append(s,s_sub)

details::details(s,summary = 'parents')
parents
$`1`
[1] 1 2 3 4 5 6 7 8

$`2`
[1]  9 10 11 12 13 14 15 16

$`3`
[1] 17 18 19 20 21 22 23

$`4`
[1] 24 25 26 27 28 29 30

details::details(s_sub,summary = 'children')
children
$`1`
[1] 3 4 5 6 7

$`2`
[1] 10 11 12 13 14

$`3`
[1] 18 19 20 21

$`4`
[1] 25 26 27 28 29

$`1`
[1] 1 2 3 4

$`2`
[1] 10 11 12

$`3`
[1] 18 19 20 21 22

$`4`
[1] 24 25 26 27 28

details::details(s_bind,summary = 'combining the two')
combining the two
$`1`
[1] 1 2 3 4 5 6 7 8

$`2`
[1]  9 10 11 12 13 14 15 16

$`3`
[1] 17 18 19 20 21 22 23

$`4`
[1] 24 25 26 27 28 29 30

$`1`
[1] 3 4 5 6 7

$`2`
[1] 10 11 12 13 14

$`3`
[1] 18 19 20 21

$`4`
[1] 25 26 27 28 29

$`1`
[1] 1 2 3 4

$`2`
[1] 10 11 12

$`3`
[1] 18 19 20 21 22

$`4`
[1] 24 25 26 27 28

i <- 1
flag <- TRUE
while(flag & i <= length(s_bind)){
  rem_i <- vector(mode = 'numeric')
  for(ii in (i+1):length(s_bind)){
    if(length(setdiff(s_bind[[ii]],s_bind[[i]]))==0){
      rem_i <- c(rem_i,ii)
    }
  }
  flag <- length(rem_i)>0
  s_bind[rem_i] <- NULL
  i <- i + 1
}

s_bind
#> $`1`
#> [1] 1 2 3 4 5 6 7 8
#> 
#> $`2`
#> [1]  9 10 11 12 13 14 15 16
#> 
#> $`3`
#> [1] 17 18 19 20 21 22 23
#> 
#> $`4`
#> [1] 24 25 26 27 28 29 30

Created on 2021-08-14 by the reprex package (v2.0.1)