SCCAF/sccaf

Require psutil as a dependency to gracefully kill processes when OOM

Closed this issue · 0 comments

pcm32 commented

Pods killed by OOM fail before being gracefully killed with:

/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
Traceback (most recent call last):
  File "/usr/local/bin/sccaf-assess", line 71, in <module>
    y_prob, y_pred, y_test, clf, cvsm, acc = sf.SCCAF_assessment(X, y, n_jobs=args.cores)
  File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 265, in SCCAF_assessment
    return self_projection(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 352, in self_projection
    cvs = cross_val_score(clf, X_train, np.array(y_train), cv=cv, scoring='accuracy', n_jobs=n_jobs)
  File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 391, in cross_val_score
    error_score=error_score)
  File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 232, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 1016, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 908, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 554, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}