CatBoost Library is complaining about unhashable class
ragrawal opened this issue · 3 comments
ragrawal commented
What is the bug?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-83c5a63cefff> in <module>
19 xgbStep = make_step(CatBoostClassifier)()(x, y)
20 model = Model(x, xgbStep, y)
---> 21 model.fit(dataset[:,0:8], dataset[:,8])
/usr/local/anaconda3/envs/interview/lib/python3.6/site-packages/baikal/_core/model.py in fit(self, X, y, **fit_params)
412
413 ys = [results_cache[t] for t in node.targets]
--> 414 fit_params = fit_params_steps.get(node.step, {})
415
416 if node.fit_compute_func is not None:
TypeError: unhashable type: 'CatBoostClassifier'
How to reproduce it?
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from baikal import Input, Model, make_step, Step
from baikal.plot import plot_model
from baikal.steps import Stack
from catboost import CatBoostClassifier
# load data
df = pd.read_csv(
'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv',
header=None)
dataset = df.values
x = Input()
y = Input()
xgbStep = make_step(CatBoostClassifier)()(x, y)
model = Model(x, xgbStep, y)
model.fit(dataset[:,0:8], dataset[:,8])
ragrawal commented
I was able to fix the issue using the following code. However not sure if this the right approach or not
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from baikal import Input, Model, make_step, Step
from baikal.plot import plot_model
from baikal.steps import Stack
from catboost import CatBoostClassifier
# load data
df = pd.read_csv(
'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv',
header=None)
dataset = df.values
class CatBoostClassifierStep(Step, CatBoostClassifier):
def __init__(self, *args, name=None, n_outputs=1, **kwargs):
super().__init__(*args, name=name, n_outputs=n_outputs, **kwargs)
def __hash__(self):
return hash(super().name)
x = Input()
y = Input()
xgbStep = CatBoostClassifierStep()(x, y)
model = Model(x, xgbStep, y)
model.fit(dataset[:,0:8], dataset[:,8])
alegonz commented
@ragrawal
Thank you for the bug report!
Indeed that's a bug in Model.fit
, I'll see what I can do about it. I think I can release a fix for it in 0.4.2. In the meantime, please use that workaround you pasted which, though a bit cumbersome, is valid and seems to be the most sensible approach.
ragrawal commented
hi Alegonz, just found out that above solution doesn't work very well with serialization. If I serialize my trained model and then try to read it back, I get following errorr 'CatBoostClassifierStep' object has no attribute '_nodes'