grf-labs/grf

Change how sample weights and forest weights interact

erikcs opened this issue · 0 comments

As pointed out by @swager in the following example it may be necessary to re-evaluate how sample weights (#418) and forest weights interact:

n <- 2000
p <- 5
obs.prob <- 1 / 20
Y0 <- rbinom(n, 1, obs.prob / (1 + obs.prob))
Y <- Y0 + rnorm(n) * 0.01
X <- matrix(rnorm(n * p), n, p)
sample.weights <- 1 + Y0 * (1 / obs.prob - 1)
weighted.mean(Y, sample.weights)
# [1] 0.4779514

rf <- regression_forest(X, Y, sample.weights = sample.weights)
mean(predict(rf)$predictions)
# master (eq 1)
# (sample weights plays virtually no role in small leaves, in the special case leaf.size=1, they play no role)
# [1] 0.1631271
# (eq 2):
# [1] 0.4648844

Replace (1) with (2):

  1. \sum_{i=1}^{n} alpha_i(x) \psi_{\theta}(.) = 0, alpha: (3) in https://arxiv.org/pdf/1610.01271.pdf adjusted by w

  2. \sum_{i=1}^{n} alpha_i(x) w_i \psi_{\theta}(.) = 0, alpha: (3) in https://arxiv.org/pdf/1610.01271.pdf

i.e., when sample weights w are passed, use new kernel weights alpha(x)' = alpha(x) * w.

This means updates to point predictions/variance estimates/error estimates.

Updated: