catboost/catboost

"RuntimeError: Attempt to pop from an empty stack" is raised when running models fit in parallel with threads.

arik-v opened this issue · 5 comments

Problem: "RuntimeError: Attempt to pop from an empty stack" is raised when running models fit in parallel with threads.
Similar to: #1855
The custom logger stack is not thread safe and raises the RuntimeError.

This does not happen in version 1.2.2, the code finishes successfully.

How to reproduce:


import numpy as np
from joblib import Parallel, delayed
from catboost import CatBoostRegressor

rng = np.random.default_rng(seed = 123)
X = rng.standard_normal(size=(1000, 10))
coef = rng.standard_normal(size=(10, 1))
y = (X @ coef).reshape(-1) + rng.standard_normal(size=1000)

def fit_model(X, y):
    model = CatBoostRegressor(silent=True)
    model.fit(X, y)
    return model

n_models = 10
models = Parallel(n_jobs=n_models, verbose=0,prefer="threads")
                 (delayed(fit_model)(X, y) for i in range(n_models))

catboost version: 1.2.3
Operating System: Amazon Linux 2023.4.20240319
CPU: Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 4
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz

GPU: none

Same problem when i'm trying Optuna with n_jobs=-1

[W 2024-04-09 17:36:10,609] Trial 99 failed with parameters: {'iterations': 1831, 'learning_rate': 0.08854663531454893, 'subsample': 0.8477806412881673, 'depth': 6, 'border_count': 182, 'l2_leaf_reg': 60.04725887937673, 'colsample_bylevel': 0.768423720172805} because of the following error: RuntimeError('Attempt to pop from an empty stack').

It happens in the end of a process

Same problem when i'm trying Optuna with n_jobs=-1

[W 2024-04-09 17:36:10,609] Trial 99 failed with parameters: {'iterations': 1831, 'learning_rate': 0.08854663531454893, 'subsample': 0.8477806412881673, 'depth': 6, 'border_count': 182, 'l2_leaf_reg': 60.04725887937673, 'colsample_bylevel': 0.768423720172805} because of the following error: RuntimeError('Attempt to pop from an empty stack').

It happens in the end of a process

We close the issue when the fix is committed to the master branch. New CatBoost release 1.2.5 includes the fix.

I installed CatBoost 1.2.5, but I still have this error when running with multiple threads. I'm getting similar errors to the ones posted above.

I installed CatBoost 1.2.5, but I still have this error when running with multiple threads. I'm getting similar errors to the ones posted above.

Then please describe in maximum detail (CPU, GPU, OS, python version, how exactly do you install catboost) how to reproduce it. I'm trying the example in this ticket and it works fine with CatBoost 1.2.5.

Running with njobs = 4

[W 2024-04-22 16:55:15,244] Trial 21 failed with parameters: {'objective': 'CrossEntropy', 'colsample_bylevel': 0.01556629158209212, 'depth': 10, 'bootstrap_type': 'Bernoulli', 'l2_reg': 0.4445241346856548, 'score': 'Cosine', 'grow_pol': 'Depthwise', 'samp_freq': 'PerTree', 'subsample_bern': 0.10263351289964752} because of the following error: RuntimeError('Attempt to pop from an empty stack').
Traceback (most recent call last):
File "/home/idies/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
value_or_values = func(trial)
File "/tmp/ipykernel_1925/1406441657.py", line 57, in objective
optuna_model.fit(X_train, y_train)
File "/home/idies/miniconda3/lib/python3.9/site-packages/catboost/core.py", line 5201, in fit
callbacks : list, optional (default=None)
File "/home/idies/miniconda3/lib/python3.9/site-packages/catboost/core.py", line 2433, in _fit
train_pool, _ = self._dataset_train_eval_split(train_pool, params, save_eval_pool=False)
File "/home/idies/miniconda3/lib/python3.9/contextlib.py", line 126, in exit
next(self.gen)
File "/home/idies/miniconda3/lib/python3.9/site-packages/catboost/core.py", line 186, in log_fixup
finally:
File "/home/idies/miniconda3/lib/python3.9/site-packages/catboost/core.py", line 165, in pop
return
RuntimeError: Attempt to pop from an empty stack
[W 2024-04-22 16:55:15,245] Trial 21 failed with value None.

File ~/miniconda3/lib/python3.9/site-packages/catboost/core.py:2433, in _fit(self, X, y, cat_features, text_features, embedding_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, plot_file, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr)
2417 train_pool = _build_train_pool(
2418 X,
2419 y,
(...)
2430 column_description
2431 )
2432 if params.get('eval_fraction', 0.0) != 0.0:
-> 2433 train_pool, _ = self._dataset_train_eval_split(train_pool, params, save_eval_pool=False)
2435 self.get_feature_importance(data=train_pool, type=EFstrType.PredictionValuesChange)
2436 else:

File ~/miniconda3/lib/python3.9/contextlib.py:126, in _GeneratorContextManager.exit(self, typ, value, traceback)
124 if typ is None:
125 try:
--> 126 next(self.gen)
127 except StopIteration:
128 return False

File ~/miniconda3/lib/python3.9/site-packages/catboost/core.py:186, in log_fixup(log_cout, log_cerr)
184 try:
185 yield
--> 186 finally:
187 _custom_loggers_stack.pop()
190 def _cast_to_base_types(value):
191 # NOTE: Special case, avoiding new list creation.

File ~/miniconda3/lib/python3.9/site-packages/catboost/core.py:165, in pop(self)
162 with self._lock:
163 if self._owning_thread_id != threading.current_thread().ident:
164 # because push from other threads does nothing
--> 165 return
166 if not self._stack:
167 raise RuntimeError('Attempt to pop from an empty stack')

RuntimeError: Attempt to pop from an empty stack

catboost version: 1.2.5
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
CPU: Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Core(s) per socket: 5
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7763 64-Core Processor

No GPU, Python 3.9.17, installed via pip install catboost==1.2.5