Quosure based resamplr
jrnold opened this issue · 2 comments
Use the new rlang quosures and tidy eval for resampling functions. Quosures are unevaluated expressions so dont' take up much memory, and keep the environment in which they should be evaluated, so we don't need to rely on R's internal copy-on-modify mechanics. The latter is nice since I don't want to rely on something that is a little magical, and can easily break without me noticing.
I'm still uncertain about what this looks like, so this issue includes (will include) comments as a puzzle through it.
What any resampling method needs:
-
Number of elements to resample from or a vector of identifiers.There are two general classes of resampling methods:
- ungrouped: need either the number of indexes or a vector of identifiers. Note: I
- grouped: need either an integer vector (values are the number of elements in each group, length is the number of groups) or a list of vectors (values are vectors of identifiers).
-
An extraction function: a function of arguments,
x
(the object to extract from) andidx
(which gives the elements to extract).
Resampling object
Yeah, so quosures are awesome, but how would this work?
The function that creates them will look something like
create_samples(expr, ...)
The expr
can be a quosure or we just grab it unevaluated and capture the environment.
Provide a single expression
create_samples(~x, ...)
The expression can be evaluated for its type and apply the appropriate functions to get the identifiers and extract elements from the object. These could also be optionally provided.
I could just have the user write an expression where .idx
stands in for the indexes to be provided.
create_samples(~ x[.idx, , drop = FALSE], list(1:5, 1:8))
The problem with this is then the user needs to provide the identifiers to sample. However, they don't need to provide the extraction method. This is very general, but can be somewhat redundant, since the user has to effectively write the extraction function every time they use it.
Two ideas for the object itself
- A quosure object or a subclass of quosures. To draw the samples the user needs to call some function, either
tidy_eval
or a wrapper provided by this package. - A subclass of
function()
. This has the nice feature that to evaluate it, the user only has to
The identifiers can be extracted via a function since in either case they'll reside in some environment.
Resampling functions
Expose all the lower-level functions which work on only identifiers or number of obs.
The following functions for each resampling algorithm could be written:
bootstrap(x, ...)
bootstrap_idx(idx, ...)
bootstrap_n(n, ...)
Where x
is some arbitrary object, idx
is a vector of identifiers, and n
is the number of elements. The bootstrap_n
form is the lower level function since it is all that is needed for the resampling algorithm, and bootstrap_idx
will simply apply bootstrap_n
to vectors of identifiers.
Then bootstrap()
is only responsible for providing a lower level function with the number of elements or a list of identifiers.
To handle groups: bootstrap()
is a generic function with methods:
list
: use grouped bootstrapping.default
: use non-grouped bootstrapping
There is some ambiguity if for some reason identifiers have to be a list
, but that's tough shit. Identifiers should be atomic vectors. If that is really needed, the user needs to deal with it in the extraction function.
bootstrap_idx(idx, ...)
bootstrap_n(n, ...)
In base R, there is sample
and sample.int
, but I can't use that naming convention since .
should be reserved for S3 methods, and something like sample_int
would suggest that it returns integer values, e.g. map_int
in purrr.
One idea would be to have a single generic function with methods, and internal logic that does different things for a scalar integer. I cannot treat a vector with length one as the number of obs, since it doesn't handle the edge case of a single integer identifier.
Notes
I must ensure that resampling a resample object works
Some messing around with this:
library("rlang")
create_samples <- function(q, idx) {
f <- as_function(q)
print(f)
function() f(idx)
}
smpl <- create_samples(~ mtcars[., , drop = TRUE], 1:5)
smpl()
Suppose the use provides a function to generate the item, and a function to extract samples:
create_samples2 <- function(data, idx) {
q <- enquo(data)
.f <- function(x, i) x[i, , drop = FALSE]
out <- quo(.f(!!q, idx))
structure(out, expr = q, idx = idx, class = c("resamplq", class(out)))
}
print_vec <- function(x, n = length(x)) {
stringr::str_c(if (length(x) > n) {
c(x[seq_len(n)], "...")
} else x, collapse = ", ")
}
print.resamplq <- function(x, ...) {
idx <- print_vec(environment(x)$idx, 10)
cat(paste0("<resample: ", deparse(set_attrs(attr(x, "expr"), NULL)), "> ",
paste(idx, collapse = ", "), "\n", sep = ""))
invisible(x)
}
q <- create_samples2(mtcars, 1:20)
print(q)
eval_tidy(q)
The required inputs for this are:
- expression (quosure) to calculate data
- index values - if not explicit values, as in bootstrap or other algorithms, then the number of observations, or a list of values.
- function of two args:
data
,idx
to extract the values
create_samples3 <- function(data, idx, ...) {
q <- enquo(data)
.f <- function(x, i) x[i, , drop = FALSE]
# I think this would be better with new_function
# but I was having a hard time setting the body
out <- function() { eval_tidy(quo(.f(!!q, idx))) }
structure(out, expr = q, idx = idx)
}
smpl3 <- create_samples3(mtcars, 1:5)
smpl3
smpl3()
Question: how to specify the indexes and the extraction function?
-
Evaluate expression and dispatch method
- this requires the expression to include objects that exist
- it's not necessary, but it may require the
-
Require user to provide number or list of indexes and an extraction function.