DoubleML/doubleml-for-py

[Feature Request]: Add Weights to IRM Model for Policy Evaluation

OliverSchacht opened this issue · 3 comments

Describe the feature you want to propose or implement

Currently, if you want to evaluate a policy (e.g. derived by IRM policy_tree()), the gate() method is the best call. However, this has two disadvantages: Firstly, you can only have $\pi(X) \in {0,1}$, while a policy might be defined $0 \leq \pi(X) \leq 1$. Secondly, the gate() function does not provide sensitivity analysis.
With this feature request I suggest to add an option weights to the IRM model.

Propose a possible solution or implementation

Allow the DoubleMLIRM object to take a $(n \times d)$ vector of weights per observation per treatment that modifies the ATE score as proposed in the Long Story Short paper to get a weighted average treatment effect. weights=None should be the default and estimate an ATE which is equivalent to the current ATE implementation. If score='ATTE', then no weights should be allowed.
Additionally, in a later step, we might add a evaluate_policy() function that computes the policy value and change the weights of an existing object without refitting (if possible).

Did you consider alternatives to the proposed solution. If yes, please describe

The alternative would be to add weights to the sensitivity_analysis() function, this however would be way more complex as currently the coefficient is not recalculated and furthermore it would change the DoubleML class having implications on every other model.

Hi @SvenKlaassen,
I added a basic implementation on branch o-irm-weights as well as a Notebook in the docs.
It looks to me that the implementation works fine and does not impact the existing methods.
However, I haven't figured out yet how to make it possible to have weights with repeated crossfitting and multiple treatments. Maybe we have to add the weights to tht score_components eventually to be able to call the right weights at the right time.

Ah great. I will try to have a look at the implementation.
I would try avoid using if statements to improve readability of the code. Maybe we can set the weights to 1 if weights=None such that the formulas can be used in either case.
Additionally you can add unit tests and run the simulations on the same sample splits to really compare if we get the exact same results.
The weights should be identical for repeated crossfitting and can be used in each run.
Currently the IRM does not support multiple treatments (so the weights also dont need to).

Thank you for your feedback. Then it sounds like we are already fine with the current solution.
I agree that the current solution decreases readibility of the code. However, the coefficient and psi is computed in the LinearScoreMixin, which is also used by other classes than DoubleMLIRM. The alternative to an if statement here would be adding weights to every DoubleML class that uses the linear score and set them to $1$, which is also not ideal.
A solution that limits the changes to DoubleMLIRM would be to modify psi_b, however this could have implications on the sensitivity and also would not allow to change the weights when predictions are not stored. What do you think?
For now the open points are:

  • Run simulations with fixed folds
  • Create unit tests
  • Enhance example and documentation