MI-numba-3mr ranking throws an error on missing relational scores
Closed this issue · 2 comments
miha-jenko commented
Running with latest release (0.94.1).
Two errors I have encountered. Using target ranking with default combination limit:
outrank \
--task all \
--data_path data_path/ \
--data_source ob-vw \
--subsampling 10 \
--label_column label \
--heuristic MI-numba-3mr \
--include_noise_baseline_features True \
--interaction_order 1 \
--transformers none \
--target_ranking_only True \
--output_folder output_path/
Error:
Traceback (most recent call last):
File "bin/outrank", line 8, in <module>
sys.exit(main())
File "lib/python3.8/site-packages/outrank/__main__.py", line 246, in main
outrank_task_conduct_ranking(args)
File "lib/python3.8/site-packages/outrank/task_ranking.py", line 233, in outrank_task_conduct_ranking
mrmrmr_ranking = rank_features_3MR(
File "lib/python3.8/site-packages/outrank/algorithms/importance_estimator.py", line 175, in rank_features_3MR
feature_relation = calc_higher_order(feat, False)
File "lib/python3.8/site-packages/outrank/algorithms/importance_estimator.py", line 162, in calc_higher_order
values.append(relational_dict[(feat, feature)])
KeyError: ('CONTROL-target', 'feature_X')
Also, using suggested combination limits and higher subsampling:
outrank \
--task all \
--data_path data_path/ \
--data_source ob-vw \
--subsampling 300 \
--label_column label \
--heuristic MI-numba-3mr \
--include_noise_baseline_features True \
--interaction_order 1 \
--transformers none \
--target_ranking_only True \
--combination_number_upper_bound 2048 \
--output_folder output_path/
Error:
Traceback (most recent call last):
File "lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "lib/python3.8/site-packages/multiprocess/pool.py", line 48, in mapstar
return list(map(*args))
File "lib/python3.8/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "outrank/outrank/core_ranking.py", line 126, in get_grounded_importances_estimate
return get_importances_estimate_pairwise(combination, args, tmp_df=tmp_df)
File "outrank/outrank/algorithms/importance_estimator.py", line 102, in get_importances_estimate_pairwise
vector_first = tmp_df[[feature_one]].values.ravel()
File "lib/python3.8/site-packages/pandas/core/frame.py", line 3767, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5877, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5938, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['feature_Y AND_REL feature_Z'], dtype='object')] are in the [columns]"
The errors seems to suggest the combination limit produces a combinatorial set which is not being respected when retrieving scores.
As a bonus, @SkBlaz suggested we could be warning users upfront which combinations will not be calculated.
miha-jenko commented
Retesting 0.95 with a larger dataset, will let you know.
miha-jenko commented
This was fixed.