Modifying DMLDiD :Double/debiased machine learning for difference-in-differences models.
[1] Chang, Neng-Chieh. (2020). Double/debiased machine learning for difference-in-differences models. The Econometrics Journal 23.2 : 177–191
- This paper proposes the following two models:
- repeated outcome (dmldid_ro):
- I believe this to be a theoretically valid and excellent achievement.
- However, there are several deficiencies in the implementation of Chang (2020). In addition, the simulation data in the paper were not suitable to demonstrate this theory. These points have been corrected and presented in My blog.
- repeated cross-section (dmldid_rcs):
- I do not believe this model is adequate. This repository will examine and attempt to modify this.
- repeated outcome (dmldid_ro):
[2] Abadie A. (2005). Semiparametric difference-in-differences estimators, Review of Economic Studies, 72, 1–19.
- References in Chang (2020)
- Proposes an IPW estimator for DiD.
[3] Chernozhukov V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, J. Robins (2018). Double/debiased machine learning for treatment and structural parameters, Econometrics Journal, 21, C1–C68.
- original DML
- Chang (2020) defines the ATT estimator for repeated cross-section data as follows:
- Y : outcome
- T : binary value (post =1 / pre = 0)
- D : binary value (treated group =1/ control group = 0)
- X : Covariates
- p̂k : D's average(cross-fitting)
- λ̂k : T's average(cross-fitting)
- ĝk(X) : propensity socre(cross-fitting)
- l2k is the following ML model (Chang(2020) uses Lasso, but essentially any ML is OK)
- Supervised label:(Ti−λ̂k)Yi
- features:covariates X(X with arbitrary transformations as q)
- training data:untreated data only
- with cross-fitting
- The label is positive if T=1 and negative if T=0. The sign is easily reversed by a change in T alone.
- On the other hand, X(q) does not necessarily contain time-dependent variables. And even if it does contain such variables, it is difficult to properly predict the above label with a linear model (Lasso) such as Chang (2020)
- For example, if the covariates are not time-dependent (e.g., demographic information such as male/female, race, etc.), prediction of this label is not possible (although it can be learned = no error occurs, but the prediction is meaningless)
- Even if the covariates include time-dependent variables, a large number of time-dependent *time-independent interaction terms must be thrown in. Still, it is almost impossible to predict the label such that it flips 180 degrees positive or negative due to unobserved variables (for l2k, T is unobserved)
- The (Ti-λ̂k) part need not be included in the prediction task. It should only be designed to estimate the latent outcome of Y
- the notebook on this issue is here
- First, divide l2k into two parts l2k_t1, l2k_t0 are arbitrary machine learning models
- l2k_t1 = E[Y | T=1, D=0, X]
- l2k_t0 = E[Y | T=0, D=0, X]
- With this l2k_t1, l2k_t0, the ATT can be modified as follows.
- Doubly robust in propensity score and l2k (outcome model) is achieved.
- cross-fitting
The following simulation data were created:
- repeated cross-section data
- true ATT := 3
- dim(X) := 10
- N = 500
- The notebook is here.
- ture ATT = 3