DoubleML/doubleml-for-py

CATE with continuous treatment and categorical outcome

hadi-gharibi opened this issue · 7 comments

I was trying to use DML for continuous treatment (price) and binary outcome(churn). Based on the docs, its not possible to use any of these techniques to this case. Is there any way to adjust any of these algorithms for this setup? If there is a paper that I could develop and add to this library I'm down to it as well.

Dear @hadi-gharibi ,

thanks for opening this issue. Here you can find two references:

I'm not really sure if I understand your question right:

Do you rather want to estimate dose-response relationships for continuous treatments and binary outcomes or do you really want to estimate CATEs after such an estimation. I think the interpretation can be a bit tricky in the latter case, maybe @SvenKlaassen has some thoughts on this

Really good point. Maybe calculating CATE is not necessary after all estimating dose-response relationships would be sufficient enough.
Let me explain my problem in more detail.I was calculating the treatment as a percentage increase from the current price (imagine some personalized subscription price). For example, 0.2 means 20% increase. In this case, explaining CATE should be fine, but after thinking about your comment, I think it might not be important after all. Estimating dose-response relationships should be good enough.

Since I’m new to the topic, could you please tell me if DML is even the correct tool for me given this problem? Am I on the right track?

Yes, I think DML would be suitable to estimate dose-response relationships.
But currently, we have no model which specifically enforces the combination of continous treatment and binary outcomes.
I think reference Double/debiased machine learning for logistic partially linear model posted by @PhilippBach seems really helpful for this setting.

Thanks for the response @PhilippBach and @SvenKlaassen
Since I need this, I have to make it work, one way or another and the setup of this problem is really common in the industry I belive it would be a nice feature. If you think it would fit in DML library setup, I can give it a shot.
I can expand the DoubleMLPLR to support continuous treatment and binary outcomes, but I think It needs a different _check_learner and lots of if and else here and there. I don't think it would be a good approach.

Another way is to have a separate class to support that, like DoubleMLLPLR for the logistic partially linear model. it would work just fine.

Yet another idea could be to keep the DoubleMLPLR as a strategy class that picks the correct approach( linear or logistic class) based on the dataset.

That would be really great.
I would also agree that a separate class DoubleMLLPLR would be the better choice.
Especially as the class would be more complicated due to the nested type of cross-fitting structure.

This is not completely related to this topic but has some overlap.

On the topic of binary outcome, DoubleMLPLR is mentioned but I noticed that ml_l is treated as a continuous outcome from this documentation Looking at the code, it looks like the predictions for the binary outcome uses class labels instead of probabilities (pred() vs pred_proba()).

  1. Is the above information correct? Please let me know if I missed something. I was looking at whether DoubleMLIRM is better suited instead
  2. If 1) is true, are there implications to be concerned about when it comes to the residuals and the resulting quality of the ATE? My suspicion is that there'd be more bias

Hi @chelsealee14

thanks for your message. I'm not 100% if I get your point here, as this issue is related to CATEs

DoubleMLPLR allows for continuous outcome only (as of now), see

_ = self._check_learner(ml_l, 'ml_l', regressor=True, classifier=False)

The linearity in the PLR model is a general challenge to predicting probabilities and binary outcomes. I don't remember exactly, but I think we tested some simulations for binary $Y$ with a DoubleMLPLR and a binary outcome and classification learner ml_l and it worked quite well. This would not be covered by theory, but could be a practical solution. Feel free to change that line of code above (setting classifier = True should be more or less enough) and try it out yourself. The DoubleMLIRM model does not impose such an assumption and should work for binary outcomes too.

We are also preparing an implementation for the logistic PLR model, which would be a partially linear model for binary outcomes. But that's still WIP

I hope this helps!