[Request]: Catboost!

Question

[Request]: Catboost!

Closed this issue a year ago · 4 comments

What is about?

Hello! ✋
I have been looking good examples of how to integrate Wandb with Catboost but have not found anything so far. Would it be possible to have an example of Wandb + catboost?
Thanks!

Answer 1 · 2023-01-13T11:34:24.000Z

Hi! We actually have an integration, although its not documented yet (on it!)

import wandb
from catboost import CatBoostClassifier, Pool, datasets
from wandb.catboost import WandbCallback, log_summary

train_df, _ = datasets.msrank_10k()
X, Y = train_df[train_df.columns[1:]], train_df[train_df.columns[0]]
pool = Pool(
    data=X[:1000],
    label=Y[:1000],
    feature_names=list(X.columns),
)

classifier = CatBoostClassifier(depth=2, random_seed=0, iterations=10, verbose=False)

wandb.init(project="catboost-test")
classifier.fit(pool, callbacks=[WandbCallback()])
log_summary(classifier, save_model_checkpoint=True)

wandb test code: https://github.com/wandb/wandb/blob/main/tests/functional_tests/t0_main/catboost/t1_regression.py

full wandb Integration code: https://github.com/wandb/wandb/blob/main/wandb/integration/catboost/catboost.py

Answer 2 · 2023-01-13T12:37:20.000Z

is the callback compatible with gpu?
My code:

import catboost as cb
import wandb
from wandb.catboost import WandbCallback, log_summary

train_cb = cb.Pool(train_x, train_y,)
test_cb = cb.Pool(test_x, test_y)

cb_params = {
    "loss_function": "Logloss", 
    "boosting_type": "Plain", 
    "depth": 8,
    "learning_rate": 0.04,
    "colsample_bylevel": 1.0,
    "random_seed": 64,
    "custom_metric": ["NDCG", "AUC", "CrossEntropy", 'PrecisionAt:top=10', 'RecallAt:top=10', 'MAP:top=10'], 
    'use_best_model': True,
    'task_type': 'GPU',
    "metric_period": 10,
    "iterations": 50,
    "max_ctr_complexity": 2,
}
clf = cb.CatBoostClassifier(**cb_params)
wandb.init(project="catboost-test")
clf.fit(
    train_cb,
    eval_set=test_cb,
    verbose=10,
    early_stopping_rounds=100,
    callbacks=[WandbCallback()]
    # plot=True,
)
log_summary(clf)

The catboost error:

---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
/tmp/ipykernel_47619/376717773.py in <cell line: 2>()
      1 wandb.init(project="catboost-test")
----> 2 clf.fit(
      3     train_cb,
      4     eval_set=test_cb,
      5     verbose=10,

~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in fit(self, X, y, cat_features, text_features, embedding_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   5126             CatBoostClassifier._check_is_compatible_loss(params['loss_function'])
   5127 
-> 5128         self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
   5129                   eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period,
   5130                   silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)

~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in _fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
   2337             raise CatBoostError("y may be None only when X is an instance of catboost.Pool or string")
   2338 
-> 2339         train_params = self._prepare_train_params(
   2340             X=X, y=y, cat_features=cat_features, text_features=text_features, embedding_features=embedding_features,
   2341             pairs=pairs, sample_weight=sample_weight, group_id=group_id, group_weight=group_weight,

~/mambaforge/envs/rec/lib/python3.9/site-packages/catboost/core.py in _prepare_train_params(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks)
   2264         _check_param_types(params)
   2265         params = _params_type_cast(params)
-> 2266         _check_train_params(params)
   2267 
   2268         if params.get('eval_fraction', 0.0) != 0.0:

_catboost.pyx in _catboost._check_train_params()

_catboost.pyx in _catboost._check_train_params()

_catboost.pyx in _catboost._PreprocessParams.__init__()

CatBoostError: User defined loss functions, metrics and callbacks are not supported for GPU

edit: catboost version == '1.1.1'

Answer 3 · 2023-01-13T14:13:21.000Z

hmmm, looks like CatBoot callbacks aren't supported on GPU, I guess this is a wider CatBoost issue

Answer 4 · 2023-01-13T14:47:40.000Z

It seems to be an issue that nobody would address on the catboost side so, closing here... thanks for the fast response! 🚀