[Bug]:
Nolan3036 opened this issue · 2 comments
Describe the bug
when I use my own data (three variables in D, four variables in X), and after that the predictions for both "ml_l", "ml_m" has shape (n_obs, iteration, number of variables in D), shouldn't it be (n_obs, iteration, 1) for "ml_l"?
Furthermore, if I see the shape of feature importance score of the model for both "ml_l", "ml_m", it is (6,), shouldn't it be (4,) in my case?
In your provided example, it works fine, also it only has one variable in D, so hard to debug, but you can reproduce it using my code.
I hope I don't miss anything but if I do please let me know thanks!
Minimum reproducible code snippet
test1=pd.DataFrame({
'd1': np.random.randn(100),
'd2': np.random.randn(100),
'd3': np.random.randn(100),
'x1': np.random.randn(100),
'x2': np.random.randn(100),
'x3': np.random.randn(100),
'x4': np.random.randn(100),
'y': np.random.randn(100)
})
obj_dml_data_from_df = DoubleMLData(test1, 'y', ["d1","d2","d3"])
ml_l=XGBRegressor(random_state=0)
ml_m=XGBRegressor(random_state=0)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data_from_df, ml_l, ml_m).fit(store_models=True)
print(dml_plr_obj.predictions["ml_l"].shape)
print(dml_plr_obj.predictions["ml_m"].shape)
print(dml_plr_obj.models["ml_l"]["d1"][0][0].feature_importances_.shape)
print(dml_plr_obj.models["ml_m"]["d1"][0][0].feature_importances_.shape)
Expected Result
(100, 1, 1)
(100, 1, 3)
(4,)
(4,)
Actual Result
(100, 1, 3)
(100, 1, 3)
(6,)
(6,)
Versions
Linux-5.4.0-150-generic-x86_64-with-glibc2.27
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
DoubleML 0.7.1
Scikit-Learn 1.0.2
This is intended as the model is generally switiching several features and treatments:
The partially linear model assumes the following form for a single treatment
which would generally extend to
for three treatments. Considering only the estimation of
with
Then we have to fit the cond. expectation ml_l
.
Therefore ml_l
depends on
I will close this issue since this is intended behavior