clinicalml/cfrnet

the "ground truth" counterfactual control outcome for the treated in Jobs

Closed this issue · 9 comments

Thank you for sharing the code.

It is not clear how to get the "ground truth" counterfactual outcome y_0 for the treated in the Jobs dataset.

It will be really helpful if you can upload the processed Jobs dataset so that we can compare with current CFR method with the same benchmark.

The datasets can be found on www.fredjo.com or www.mit.edu/~fredrikj.

Best,
Fredrik

capture

Thank you for your reply. It seems the data resources are inaccessible, see the screenshot.

My apologies. I have changed the mirror for now: http://www.fredjo.com/files/jobs_DW_bin.train.npz, http://www.fredjo.com/files/jobs_DW_bin.test.npz. I will try to get the original hosting to work later.
If the problem persists, email me at fredrikj@mit.edu and I could email you the files.

Thank you, Fredrik.
hope you fix it soon since link for other two datasets (IHDP and News) are also broken.
cheers

It's not so much that they are broken, but that they are locked behind a certificate. I moved also IHDP and News to a temporary location now.

Best,
F

My apologies. I have changed the mirror for now: http://www.fredjo.com/files/jobs_DW_bin.train.npz, http://www.fredjo.com/files/jobs_DW_bin.test.npz. I will try to get the original hosting to work later.
If the problem persists, email me at fredrikj@mit.edu and I could email you the files.

I'm confused that why do the jobs data you provide have 17 dimensional characteristics? From paper, The study by LaLonde only includes 8 covariates. What do these 17 dimensional features represent? The original data by LaLonde value is large, and what kind of processing has been done to the original data?

As mentioned in the paper, we use the extended feature set of

Dehejia, Rajeev H and Wahba, Sadek. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics, 84(1):151–161, 2002.

As mentioned in the paper, we use the extended feature set of

Dehejia, Rajeev H and Wahba, Sadek. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics, 84(1):151–161, 2002.

Thanks for your help.
I downloaded this reference, but I still can't find where to provide the extended feature set.
For example, in this reference, the average age of the samples is over 20 years old, but in “http://www.fredjo.com/files/jobs_DW_bin.test.npz“,most features are decimals. Did you do something about the data? Or where can I find these operations?

Hi again. The extended feature set is outlined in Table 3 in Smith, Jeffrey A., and Petra E. Todd. "Does matching overcome LaLonde's critique of nonexperimental estimators?." Journal of econometrics 125.1 (2005): 305-353.

If I recall correctly, non-binary variables were z-scored. In another version, ages were binned into different groups. As I did not write the code for transforming the Jobs dataset, I am not sure at this point. Feel free to contact me on fredrik.johansson@chalmers.se and I can share the original MATLAB scripts.