An R / Rust Package to implement an exhaustive tree search for
policy-learning. Aims to extend and speed up work done with the
policytree
package.
There’s just one function, sparse_policy_tree()
. Use it just as you
would the policy_tree()
function from the policytree
package.
Trees aren’t guaranteed to be exactly the same as those produced by
policytree
in that some leaves may be left unpruned (working on this),
but they should give the same predictions.
n <- 400
p <- 4
d <- 3
depth <- 2
# Classification task taken from policytree tests
# Continuous X
X <- round(matrix(rnorm(n * p), n, p),2)
colnames(X) <- letters[1:ncol(X)]
Y <- matrix(0, n, d)
best.tree <- policytree:::make_tree(X, depth = depth, d = d)
best.action <- policytree:::predict_test_tree(best.tree, X)
Y[cbind(1:n, best.action)] <- 100 * runif(n)
best.reward <- sum(Y[cbind(1:n, best.action)])
tree <- sparse_policy_tree(X,Y,2)
tree
#> policy_tree object
#> Tree depth: 2
#> Actions: 1 2 3
#> Variable splits:
#> (1) split_variable: b split_value: -0.73
#> (2) split_variable: a split_value: -3.36
#> (4) * action: 3
#> (5) * action: 3
#> (3) split_variable: a split_value: 0.57
#> (6) * action: 3
#> (7) * action: 2
You must have Rust installed to compile this package. The rust website provides an excellent installation script that has never caused me any issues.
On Linux, you can install Rust with:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
On Windows, I use the rust installation wizard, found here.
Once you install rust, you should be able to install the package with:
devtools::install_github("Yale-Medicaid/sparsepolicytree")
Below are the results from a series of benchmarks to gauge the speed of the package. These were run on an old 4-core, 8-thread server that I have access to, and I think should be pretty representative of the speed a user can expect to see if they run the algorithm on all cores of a modern data-science laptop.
Number of Observations | Number of Distinct Predictor Values | Number of Predictors | Number of Treatments | Time |
---|---|---|---|---|
10^2 | 2 | 30 | 20 | 0.003s |
10^3 | 2 | 30 | 20 | 0.015s |
10^4 | 2 | 30 | 20 | 0.15s |
10^5 | 2 | 30 | 20 | 1.57s |
10^6 | 2 | 30 | 20 | 17s |
10^2 | 30 | 30 | 20 | 0.052s |
10^3 | 30 | 30 | 20 | 0.240s |
10^4 | 30 | 30 | 20 | 3.041s |
10^5 | 30 | 30 | 20 | 100s |
10^6 | 30 | 30 | 20 | 21 min |
10^2 | 10^2 | 30 | 20 | 0.41s |
10^3 | 10^3 | 30 | 20 | 40s |
10^4 | 10^4 | 30 | 20 | 70 min |
sparse_policy_tree is dominant when the number of values a variable can take is small (say, under 200), but performs poorly when the number of unique values is large. For dense variables, you are generally better off using the policytree package, which is more developed,and will return faster while running on a only single core.
To constrain the number of cores the program uses, you can set the
RAYON_NUM_THREADS
variable before running the search. At present, this
variable is read at the construction of the multithreading thread pool,
and so must be set once each R session. In a future version, I’ll work
to include a fix so that the number of threads can be included in an
argument to the sparse_policy_tree
function.
Here’s an example of how to set the function to run on an single core in R:
Sys.setenv(RAYON_NUM_THREADS=1)