Random selection from an iterator/iterable object
Closed this issue · 3 comments
In many cases, it makes a lot of sense to randomly select from some object (or iterator) by sampling the indices of the object. Example:
set.seed(42)
n <- nrow(iris)
indices <- seq_len(n)
train_idx <- sample(indices, 2/3 * n)
train_data <- iris[train_idx, ]
test_data <- iris[-train_idx, ]
If n
is extremely large, the indices
vector becomes extremely large. To avoid this overhead, it makes sense to have some interface like:
it <- isample(iseq_len(n), 50)
as.list(it) # vector of length 50
Given that iterators are sequential in nature, sampling the exact number of elements will be difficult if n
is unknown. If n
were known, this would simply be a binomial/hypergeometric sampling depending on whether sampling with or without replacement.
Hmm, think on this before moving forward. Scrap the idea?
The idea makes sense in some contexts though -- randomly selecting from the Cartesian product as in Python's random_product itertools recipe. This makes even more sense when the Cartesian product from expand.grid
is HUGE, and we care only about some random subset.
Passing on this issue. PITA.