microsoft/onnxruntime

Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed

zyl112334 opened this issue · 20 comments

Hi, i use onnxruntime to infer, but program error. How can i solve this problem? Thanks!

System information
Linux Ubuntu 16.04
python3.6.5
onnxruntime 1.8.0
only cpu(4 cores), and ONNX Runtime installed from pip.

File "/home/admin/qiyun/target/qiyun/tools/infer/utility.py", line 104, in create_predictor
sess = ort.InferenceSession(model_file_path)
File "/home/admin/.local/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/admin/.local/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:142 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int ()(int, Eigen::ThreadPoolInterface),Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed

snnn commented

I believe you have used cpuset at the same time?

I solve this problem with seting "options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1"(the defalut value for those params is 0), how can i understand this condition?

I meet the same error, by setting "options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1" ,but the inference speed is slow,How can I still inference using CPU under GPU environment

snnn commented

What if you set intra_op_num_threads to the number of your CPU cores?

What if you set intra_op_num_threads to the number of your CPU cores?

slower if I set intra_op_num_threads to the number of my CPU cores,So How can I infer only use CPU under GPU environment,thanks!

snnn commented

How can I infer only use CPU under GPU

You can use the cpu only package: https://pypi.org/project/onnxruntime/ instead of https://pypi.org/project/onnxruntime-gpu/ .

Hi, I also met the same problem. And I want to use GPU to do the onnx inference, I tried 'options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1', but the error become 'segmentation fault', I wonder is there any other solutions to solve this problem?

my environment:
python 3.6.13
onnx 1.10.2
onnxruntime-gpu 1.10.0
torch 1.10.2
torchaudio 0.10.2
torchvision 0.11.3
OS x86_64 GNU/Linux
GCC version Ubuntu 9.3.0
CUDA 11.4
GPU type A100
Driver Version 470.82.01

@snnn just to provide more context to @poem2018 's comment: our onnxruntime-gpu installation on a shared DGX-A100 machine (8x GPUs, 2x AMD CPUs per node) works totally fine when an entire dedicated node is used.

We encounter seg-faults / core dumps / the above exception when it is run on a shared node allocation, where each user is given a dedicated single GPU on the node and shares a fraction of the cores with another user controlled via cpusets which lock user sessions to gpu-affine cores, e.g.

cat /sys/fs/cgroup/cpuset/single-gpu/gpu0/cpuset.cpus
48-63,176-191

Within that cpuset, you have to share cycles with another user on the paired GPU, if it is in use. cgroup fair scheduling is used for that.

I dont believe we had issues with earlier versions of ORT using cpuset, but I would need to recheck it. And as @poem2018 indicated, setting the num threads to 1 does not avoid the issue. So not clear if #10122 would fix this.

#10113 (comment) is there a way to bind specific core affinity?

snnn commented

By default, ONNX Runtime tried to bind each thread to a logical CPU if the user didn't explicitly set intra_op_num_threads. As you see, it is causing problems. So I'd prefer to not doing the binding. And if you have the need to setup thread affinity through ONNX Runtime API, we can design one and add it to onnxruntime_c_api.h. ONNX Runtime is an open source project, if you already have a design in mind, welcome to let us know.

Any progress?I had the same problem with 1.10.1 cpu version.

@snnn

Suppose we set intra_op_num_thread on a specific integer or cpu_count(logical=True).

Then we create an image from our project(with onnx) and setup a container. If we constrain cpu cores for the container, what if this number is fewer than set intra_op_num_thread parameter?

By default, ONNX Runtime tried to bind each thread to a logical CPU if the user didn't explicitly set intra_op_num_threads. As you see, it is causing problems. So I'd prefer to not doing the binding. And if you have the need to setup thread affinity through ONNX Runtime API, we can design one and add it to onnxruntime_c_api.h. ONNX Runtime is an open source project, if you already have a design in mind, welcome to let us know.

I am using nvidia triton with onnxruntime backend. When I try to run triton with k8s deployment, I ran into same pthread_setaffinity_np failed problem. Because the triton is already compiled and it does not provide method to set intra_op_num_thread, I wonder if there is any envorionment variable for onnx to specify intra_op_num_thread?

I see the same issue as described above. I was setting affinity when I launched a docker container "--cpuset-cpus=32-63,160-191" which removes ORT from having to deal with it. Is there something I should set in ORT to avoid the failure?

Hi, I also ran into this issue while using Slurm to submit jobs to a computing cluster. Slurm uses the --cpu-bind=... option to set the explicit process affinity binding and control options. This runs into an issue with ORT when trying to start a new session, it leads to this error

Eigen::ThreadPoolInterface, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed

By setting these options (like recommended here)

sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);

The issue can no longer be observed.

Setting the number of threads used to parallelize the execution of the graph (across nodes) solves the problem since ORT can no longer chosse this by itself. This can potentially be a problem for every job-scheduler, but it depends on how the system is set up.

Hi, I use a tricky method to modify the default value globally to prevent such errors.

We will rely on onnx, onnx-simplify, etc. during the development process. By default, these will implicitly call ORT for inferencing. So the above method needs to be fixed one by one. Then we use an intrusive method to implement global modification of the default value to prevent such errors from appearing.

InferenceSession implements session init by calling _create_inference_session in the constructor

session_options = self._sess_options if self._sess_options else C.get_default_session_options()
if self._model_path:
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
else:
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)

We can modify the return result of C.get_default_session_options().

Add the following code to our program to globally modify the default inter_op_num_threads and intra_op_num_threads parameters,

import onnxruntime as ort
_default_session_options = ort.capi._pybind_state.get_default_session_options()
def get_default_session_options_new():
     _default_session_options.inter_op_num_threads = 1
     _default_session_options.intra_op_num_threads = 1
     return _default_session_options
ort.capi._pybind_state.get_default_session_options = get_default_session_options_new

# other ORT inference code 
# ...

sessionOptions

Hello, thank you for this suggestion. I am using SLURM and facing this problem too. I wonder where I could set sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);.
Thank you!

sessionOptions

Hello, thank you for this suggestion. I am using SLURM and facing this problem too. I wonder where I could set sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);. Thank you!

You can add these two options into the script where you are also initializing the ORT session.

Hoeze commented

@lkretsch doesn't this basically limit OnnxRuntime to run on a single core?

@Hoeze yes but normally in such an application you anyways just use one core for your job, at least that's how I do it. The interference is fast enough for me with just one core.

The issue is because of CPU affinity set for new created threads, the default assigned CPU core may not be available from job scheduler when cgroup is enabled. One solution is to override the function pthread_setaffinity_np. The c code is available from

https://raw.githubusercontent.com/wangsl/pthread-setaffinity/main/pthread-setaffinity.c

to compile the code

gcc -fPIC -shared -Wl,-soname,libpthread-setaffinity.so -ldl -o libpthread-setaffinity.so pthread-setaffinity.c

then

export LD_PRELOAD=libpthread-setaffinity.so

Now it should work.