[bug] training xgboost dosen't work with dataframe, only numpy array
Opened this issue · 3 comments
Yarden234 commented
Hello and thanks you for that package.
I came across a problem while trying to use a xgboost model that was trained on dataframe.
So this is my code:
X_train, X_test, y_train, y_test = load_csv('X_train'), load_csv('X_test'), load_csv('y_train'), load_csv('y_test')
model = XGBClassifier(tree_method='hist')
X_train_val, y_train_vals = X_train.values, y_train.values.squeeze()
X_test_val, y_test = X_test.values, y_test.values.squeeze()
model.fit(X_train, y_train)
# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)
Which produce this exception:
Traceback (most recent call last):
File "/home/jupyter/owlytics-data-science/influence/influence.py", line 35, in <module>
explainer = BoostIn().fit(model, X_train, y_train)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/boostin.py", line 44, in fit
super().fit(model, X, y)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/base.py", line 31, in fit
self.model_ = parse_model(model, X, y)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/__init__.py", line 33, in parse_model
trees, params = parse_xgb_ensemble(model)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in parse_xgb_ensemble
trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in <listcomp>
trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 88, in _parse_xgb_tree
node_dict = _parse_line(line)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 190, in _parse_line
res['feature'], res['threshold'] = _parse_decision_node_line(line)
File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 201, in _parse_decision_node_line
feature_ndx = int(feature_str[1:])
ValueError: invalid literal for int() with base 10: 'ecent_beta_blockers_change'
However, When training X_train_val, y_train_val (which is a numpy array) works perfectly good.
It would be great if you could support training with DataFrame as well.
Thanks again!
jjbrophy47 commented
Hi Yarden234! Thanks for bringing this up. I believe I've fixed this issue now in version 0.1.1. Please give it a try and feel free to open this issue back up if it's not working. Thanks again!
aclarkse commented
Hi there,
I encountered this error still. I was wondering if you might check on it again. Thanks!
jjbrophy47 commented
Hi @aclarkse, can you provide a fully reproducible example, please?