DoubleML/doubleml-for-py

[Bug]: KeyError in DoubleMLPLIV.fit() with multiple instruments and store_predictions=True

vnastl opened this issue · 1 comments

Describe the bug

In the case of multiple instruments, the function DoubleMLPLIV.fit() throws an error when executed with the parameter 'store_predictions=True'.

Minimum reproducible code snippet

import numpy as np
import doubleml as dml
from doubleml.datasets import make_pliv_CHS2015
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_l = clone(learner)
ml_m = clone(learner)
ml_r = clone(learner)
obj_dml_data = make_pliv_CHS2015(n_obs=500, alpha=1.0, dim_x=10, dim_z=10, return_type='DoubleMLData')
dml_pliv_obj = dml.DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)
dml_pliv_fit = dml_pliv_obj.fit(store_predictions=True)

Expected Result

Predictions for the whole list of learners ('params_names') are stored, i.e. for:

print(dml_pliv_obj.params_names)

['ml_l',
 'ml_r',
 'ml_m_Z1',
 'ml_m_Z2',
 'ml_m_Z3',
 'ml_m_Z4',
 'ml_m_Z5',
 'ml_m_Z6',
 'ml_m_Z7',
 'ml_m_Z8',
 'ml_m_Z9',
 'ml_m_Z10']

Actual Result

After executing the code, the following error is stated:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/sb/q_1b_jtx6_x55nw95r50s0tr0002mt/T/ipykernel_44055/2685974828.py in <module>
     11 obj_dml_data = make_pliv_CHS2015(n_obs=500, alpha=1.0, dim_x=10, dim_z=10, return_type='DoubleMLData')
     12 dml_pliv_obj = dml.DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)
---> 13 dml_pliv_fit = dml_pliv_obj.fit(store_predictions=True)

/opt/anaconda3/envs/py39/lib/python3.10/site-packages/doubleml/double_ml.py in fit(self, n_jobs_cv, keep_scores, store_predictions, store_models)
    500 
    501                 if store_predictions:
--> 502                     self._store_predictions(preds['predictions'])
    503                 if store_models:
    504                     self._store_models(preds['models'])

/opt/anaconda3/envs/py39/lib/python3.10/site-packages/doubleml/double_ml.py in _store_predictions(self, preds)
   1000     def _store_predictions(self, preds):
   1001         for learner in self.params_names:
-> 1002             self._predictions[learner][:, self._i_rep, self._i_treat] = preds[learner]
   1003 
   1004     def _store_models(self, models):

KeyError: 'ml_m_Z1'

Versions

macOS-10.16-x86_64-i386-64bit
Python 3.10.6 (main, Oct 24 2022, 11:04:34) [Clang 12.0.0 ]
DoubleML 0.6.dev0
Scikit-Learn 1.1.3

Thanks for reporting the issue.
It will be fixed with #182. The same bug occured when calculating the RMSE for the nuisance functions. I will leave the issue open until the fix is merged into the dev version.