NVIDIA/k8s-dra-driver

kubelet-plugin Error: Segmentation fault (core dumped)

CoderTH opened this issue · 0 comments

Os: centos7.9
image

I want to run the demo example in the code,when I ran the ./install-dra-driver.sh script, the kubelet-plugin pod could not be started. After troubleshooting, I found that the LD_LIBRARY_PATH setting was wrong, similar to this issue. #4
image
So I manually modified this path, and it seemed that this error was no longer reported, but at the same time, the pod was still restarting, and there were no related logs.
image

image image

still error
image

image

So I wanted to know what happened, so I manually modified the container run command and sleep for a while, so that I could manually check and run nvidia-dra-plugin, but the error still occurred.

image image

I still suspect that it is a problem with LD_LIBRARY_PATH. Because the previous setting was wrong, at least there was a log with the wrong path. After setting it correctly, there were no logs and the error kept reporting after restarting, so I manually set the wrong path.
image
Something magical happened, the pod ran successfully, and I was able to exec it into the container.

image I manually exported LD_LIBRARY_PATH and then ran nvidia-dra-plugin and got the following error

I would like to ask, what is the problem and what happened?