DRtester error either with CausalForestDML or with pandas dataframes (or both?) ?
I am trying to use the DRtester from econML to understand whether there actually is heterogeneity in the CATE that I am estimating.
As I understand, the DRtester was built as an equivalent function to Athey's test_calibration in R. I followed the example in the notebook you provided (thanks a lot for that btw). While I am able to download and run the linked notebook example without any problems, I am unable to do the same with my own dataframes and/or model specifications.
I am running a CausalForest model with the following setup:
X_train = pandas dataframe of user features with 36 columns and ~ 200k rows
Y_train = outcome variable with 1 column and ~200k rows
T_train = binary treatment variable 1 column and ~ 200k rows.
And similarly I have 3 validation dataframes Xval, Yval and Tval which have the same column structure as the training data but fewer rows as they make-up 20% of the data while the training data is 80%.
If I run the DRtester:
- using the same DML model chosen as in the example notebook (meaning take the exactly same code from the example notebook just changing the input data to my dataframes rather than numpy arrays as in the example notebook), or
- try to use both my data and the CausalForestDML specification I am working with
I get the same error (detailed below) in both cases.
Could you please help me understand why? Or how could I fix it?
And thank you so much for all the improvements and additions you've been making - I greatly appreciate it!
I am estimating the following CausalForest model:
est = CausalForestDML(criterion='het',
n_estimators=200, #100
max_depth=5, #with 4 also the noise disappears
est.fit(Y_train, T_train, X=X_train, W=None)
Then I tried to use the following in the DRtester:
dml_tester = DRtester(
).fit_nuisance(Xval, Tval, Yval, X_test, T_test, Y_test)
res_dml = dml_tester.evaluate_all(X_test, X)
This results in the following error:
This definitely seems to be a bug.
It seems we are indexing arrays in a way that is compatible with numpy but not compatible with pandas.
I think for now you may just have to convert your pandas dataframes to np arrays before passing to DRTester, via the .values
attribute. e.g. X_test.values
This works, thank you!!