[Request]: Catboost!
Closed this issue · 4 comments
What is about?
Hello! ✋
I have been looking good examples of how to integrate Wandb with Catboost but have not found anything so far. Would it be possible to have an example of Wandb + catboost?
Thanks!
Hi! We actually have an integration, although its not documented yet (on it!)
import wandb
from catboost import CatBoostClassifier, Pool, datasets
from wandb.catboost import WandbCallback, log_summary
train_df, _ = datasets.msrank_10k()
X, Y = train_df[train_df.columns[1:]], train_df[train_df.columns[0]]
pool = Pool(
data=X[:1000],
label=Y[:1000],
feature_names=list(X.columns),
)
classifier = CatBoostClassifier(depth=2, random_seed=0, iterations=10, verbose=False)
wandb.init(project="catboost-test")
classifier.fit(pool, callbacks=[WandbCallback()])
log_summary(classifier, save_model_checkpoint=True)
wandb test code: https://github.com/wandb/wandb/blob/main/tests/functional_tests/t0_main/catboost/t1_regression.py
full wandb Integration code: https://github.com/wandb/wandb/blob/main/wandb/integration/catboost/catboost.py
is the callback compatible with gpu?
My code:
import catboost as cb
import wandb
from wandb.catboost import WandbCallback, log_summary
train_cb = cb.Pool(train_x, train_y,)
test_cb = cb.Pool(test_x, test_y)
cb_params = {
"loss_function": "Logloss",
"boosting_type": "Plain",
"depth": 8,
"learning_rate": 0.04,
"colsample_bylevel": 1.0,
"random_seed": 64,
"custom_metric": ["NDCG", "AUC", "CrossEntropy", 'PrecisionAt:top=10', 'RecallAt:top=10', 'MAP:top=10'],
'use_best_model': True,
'task_type': 'GPU',
"metric_period": 10,
"iterations": 50,
"max_ctr_complexity": 2,
}
clf = cb.CatBoostClassifier(**cb_params)
wandb.init(project="catboost-test")
clf.fit(
train_cb,
eval_set=test_cb,
verbose=10,
early_stopping_rounds=100,
callbacks=[WandbCallback()]
# plot=True,
)
log_summary(clf)
The catboost error:
---------------------------------------------------------------------------
CatBoostError Traceback (most recent call last)
/tmp/ipykernel_47619/376717773.py in <cell line: 2>()
1 wandb.init(project="catboost-test")
----> 2 clf.fit(
3 train_cb,
4 eval_set=test_cb,
5 verbose=10,
~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in fit(self, X, y, cat_features, text_features, embedding_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
5126 CatBoostClassifier._check_is_compatible_loss(params['loss_function'])
5127
-> 5128 self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
5129 eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period,
5130 silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in _fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
2337 raise CatBoostError("y may be None only when X is an instance of catboost.Pool or string")
2338
-> 2339 train_params = self._prepare_train_params(
2340 X=X, y=y, cat_features=cat_features, text_features=text_features, embedding_features=embedding_features,
2341 pairs=pairs, sample_weight=sample_weight, group_id=group_id, group_weight=group_weight,
~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in _prepare_train_params(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks)
2264 _check_param_types(params)
2265 params = _params_type_cast(params)
-> 2266 _check_train_params(params)
2267
2268 if params.get('eval_fraction', 0.0) != 0.0:
_catboost.pyx in _catboost._check_train_params()
_catboost.pyx in _catboost._check_train_params()
_catboost.pyx in _catboost._PreprocessParams.__init__()
CatBoostError: User defined loss functions, metrics and callbacks are not supported for GPU
edit: catboost version == '1.1.1'
hmmm, looks like CatBoot callbacks aren't supported on GPU, I guess this is a wider CatBoost issue
It seems to be an issue that nobody would address on the catboost side so, closing here... thanks for the fast response! 🚀