Require psutil as a dependency to gracefully kill processes when OOM
Closed this issue · 0 comments
pcm32 commented
Pods killed by OOM fail before being gracefully killed with:
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
warnings.warn("Failed to kill subprocesses on this platform. Please"
Traceback (most recent call last):
File "/usr/local/bin/sccaf-assess", line 71, in <module>
y_prob, y_pred, y_test, clf, cvsm, acc = sf.SCCAF_assessment(X, y, n_jobs=args.cores)
File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 265, in SCCAF_assessment
return self_projection(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 352, in self_projection
cvs = cross_val_score(clf, X_train, np.array(y_train), cv=cv, scoring='accuracy', n_jobs=n_jobs)
File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 391, in cross_val_score
error_score=error_score)
File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 232, in cross_validate
for train, test in cv.split(X, y, groups))
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 1016, in __call__
self.retrieve()
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 908, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 554, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}