grf-labs/policytree

Ensure policy tree leaf nodes have minimum sample size per treatment group (balance)

Closed this issue · 4 comments

Hello,

Is there a point(s) in the source code where I may be able to modify the min.node.size argument to policy_tree so that each leaf node contains a minimum number of units per treatment group according to the original labels W that go into grf::causal_forest?

For example, suppose I have 5 treatment groups and I'd like 100 observations per treatment group to be in each leaf. If the data is from a balanced randomized experiment, it seems setting min.node.size=500 may approximately enforce this, but the exact constraint is not guaranteed.

Thank you for any leads!

erikcs commented

Hi @gabriel-ruiz, unfortunately the current min.node.size only operates on samples. We tried out some more general criteria, like the variance of the centered treatment indicators, that could capture your use case, but since policytree is doing exact tree search, even moderately simple book keeping like this slowed it down, and se we left it as is. If you really want to modify the source yourself, the place to do it would be here.

Got it, thank you. I'll try it out. Worst case, a tree for each treatment group should do the trick for my use case.

Hello,

Just wanted to share a manuscript regarding this earlier comment.

The empirical results section has the multi tree workaround I mentioned.

Thanks if you take a look!

https://arxiv.org/abs/2403.04039

Thank you for sharing, this looks very cool!