/sparsepolicytree

Fast exhaustive tree search for policy learning

Primary LanguageRustGNU General Public License v3.0GPL-3.0

Sparse Policy Tree

An R / Rust Package to implement an exhaustive tree search for policy-learning. Aims to extend and speed up work done with the policytree package.

Usage

There’s just one function, sparse_policy_tree(). Use it just as you would the policy_tree() function from the policytree package.

Trees aren’t guaranteed to be exactly the same as those produced by policytree in that some leaves may be left unpruned (working on this), but they should give the same predictions.

n <- 400
p <- 4
d <- 3
depth <- 2
# Classification task taken from policytree tests
# Continuous X
X <- round(matrix(rnorm(n * p), n, p),2)
colnames(X) <- letters[1:ncol(X)]
Y <- matrix(0, n, d)
best.tree <- policytree:::make_tree(X, depth = depth, d = d)
best.action <- policytree:::predict_test_tree(best.tree, X)
Y[cbind(1:n, best.action)] <- 100 * runif(n)
best.reward <- sum(Y[cbind(1:n, best.action)])

tree <- sparse_policy_tree(X,Y,2)

tree
#> policy_tree object 
#> Tree depth:  2 
#> Actions:  1 2 3 
#> Variable splits: 
#> (1) split_variable: b  split_value: -0.73 
#>   (2) split_variable: a  split_value: -3.36 
#>     (4) * action: 3 
#>     (5) * action: 3 
#>   (3) split_variable: a  split_value: 0.57 
#>     (6) * action: 3 
#>     (7) * action: 2

Installation:

Installing Rust:

You must have Rust installed to compile this package. The rust website provides an excellent installation script that has never caused me any issues.

On Linux, you can install Rust with:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

On Windows, I use the rust installation wizard, found here.

Installing Package from Github:

Once you install rust, you should be able to install the package with:

devtools::install_github("Yale-Medicaid/sparsepolicytree")

Benchmarks:

Below are the results from a series of benchmarks to gauge the speed of the package. These were run on an old 4-core, 8-thread server that I have access to, and I think should be pretty representative of the speed a user can expect to see if they run the algorithm on all cores of a modern data-science laptop.

Number of Observations Number of Distinct Predictor Values Number of Predictors Number of Treatments Time
10^2 2 30 20 0.003s
10^3 2 30 20 0.015s
10^4 2 30 20 0.15s
10^5 2 30 20 1.57s
10^6 2 30 20 17s
10^2 30 30 20 0.052s
10^3 30 30 20 0.240s
10^4 30 30 20 3.041s
10^5 30 30 20 100s
10^6 30 30 20 21 min
10^2 10^2 30 20 0.41s
10^3 10^3 30 20 40s
10^4 10^4 30 20 70 min

sparse_policy_tree is dominant when the number of values a variable can take is small (say, under 200), but performs poorly when the number of unique values is large. For dense variables, you are generally better off using the policytree package, which is more developed,and will return faster while running on a only single core.

Limiting the Number of Threads:

To constrain the number of cores the program uses, you can set the RAYON_NUM_THREADS variable before running the search. At present, this variable is read at the construction of the multithreading thread pool, and so must be set once each R session. In a future version, I’ll work to include a fix so that the number of threads can be included in an argument to the sparse_policy_tree function.

Here’s an example of how to set the function to run on an single core in R:

Sys.setenv(RAYON_NUM_THREADS=1)