grf-labs/policytree

Optimal policy based on a subset of variables leads to greater benefit

Closed this issue · 2 comments

Hi team,

I feel like I've used my quote of questions. But anyways, I will try it again. I have an example where the benefit of an optimal policy based on K variables is smaller than the benefit of K-1 variables. I am probably missing a major point, but here is what I do:

# Policy var matrices
policy_vars_full<-as.matrix(X%>%select(X1,X2,X3,X4))
policy_vars_reduced<-as.matrix(X%>%select(X1,X2,X3))

# Find optimal trees
tree_full<-policy_tree(policy_vars_full, Gamma.matrix, depth = 2) 
tree_reduced<-policy_tree(policy_vars_reduced, Gamma.matrix, depth = 2)

I then calculate the advantage of each policy as

policy_full<-predict(tree_full, X%>%select(colnames(policy_vars_full))
policy_reduced<-predict(tree_reduced, X%>%select(colnames(policy_vars_reduced))

benefit_full<-mean(( policy_full - 1) *Gamma.dr)
benefit_reduced<-mean(( policy_reduced- 1) *Gamma.dr)

What confuses me is that in my case benefit_full<benefit_reduced. The difference is very small, and they are clearly not significantly different from each other. But given that I calculate the benefits on the sample sample I used to get the best policies I don't see where this uncertainty is coming from. Is there a simple explanation for this or am I doing something wrong?

Thanks again for the great package, papers and responses!
Hans

Hi @hhsievertsen, no worries, questions help us improve the package. In your example, Gamma.dr is a vector of held-out rewards not used for tree fitting, right? In that case you might observe the reward being both larger, or smaller, as it can go both ways on a test set.

Thanks @erikcs, in this case I used the same sample for both testing and training. That is why I thought that "mechanically" it had to be that benefit_full>=benefit_reduced.

However, submitting the question here apparently works like pressing the submit button when submitting a paper. I discovered a mistake in my R script and now results are in line with expectations. Apologies,

Hans