grf-labs/policytree

policy_tree() can't scale to my data size but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy ?

JunhaoWang opened this issue · 2 comments

policy_tree() can't scale to my data size (100000 obs, 200 dimensional state/covariate, 20 actions) but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy, instead of searching exhaustively through tree functions from state to produce actions?

There is a note on scaling in the online documentation here:

https://grf-labs.github.io/policytree/articles/policytree.html#gauging-the-runtime-of-tree-search

As you see the cardinality of the the Xj's is important, and you can speed things up by trying to increase split.step (in effect rounding the Xj's).

But n=100k and p=200 will not take an agreeable amount of time. You can try to reduce the dimensionality by only using say the 20 variables with the highest split frequencies across the 20 causal forests.

The argmax policy is discussed in section 5.1 (California Gain example) in https://arxiv.org/pdf/1702.02896.pdf (referred to as the plug-in policy) and may be fine, depending on your purpose (interpretable predictions or not).

For practical reference, here is a short table of empirical run times for policy_tree (version 1.0).

depth n (continuous) features actions split.step time
2 1000 30 20 1 1.5 min
2 1000 30 20 10 7 sec
2 10 000 30 20 1 3 hrs
2 10 000 30 20 10 14 min
2 10 000 30 20 1, but round(X, 2) 8 min
2 100 000 30 20 10 50 hrs
2 100 000 30 20 1, but round(X, 2) 6.3 hrs
2 100 000 60 20 1, but round(X, 2) 25 hrs
2 100 000 30 3 10 7.4 hrs