sicxu/Deep3DFaceRecon_pytorch

[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)

yuzhou164 opened this issue · 25 comments

Greate work. I'm having a problem on running the inference code.
Firstly, I meeted the problem of fatal error: EGL/egl.h: No such file or directory.
I installed a package by apt-get install libegl1-mesa-dev.
Then I met another problem
[F glutil.cpp:338] eglInitialize() failed
Aborted (core dumped).
,
Do you have any ideas how to fix this problem?
企业微信截图_1626788788701
企业微信截图_16267887991412
企业微信截图_1626788741329

My Linux Environment:
cudnn: 7.6
cuda: 10.2
python :3.6
pytorch:1.6.0

Hi, the problem seems to be caused by some OpenGL related libraries required by nvdiffrast. Could you try to follow the Dockerfile provided by nvdiffrast to install the dependencies and re-install nvdiffrast to see if it works?

Thank for your answering, the problem has been solved. I can close this issue.

Hi, glad to hear that you have solved the issue. Could you post the extra dependencies you have installed?
We can add them to our installation instruction and help newcomers to build the environment from scratch.

nvdiffrast.zip
Hi, This ZIP package includes a dockerfile with requirement.txt for this Deep3DFaceRecon_pytorch project.
The command to build the environment is docker build -f docker/Dockerfile -t name:tagname .
One thing needs attention is that the version correspondence between version Cuda and version Nvidia Driver . Please refers the Nvidia website

Okay. Thanks for your kind reply.

@YuDeng Hi, I also encountered [F glutil.cpp:338] eglInitialize() failed Aborted . My system is centos. DId you run the code in the centos? CentOS is reportedly problematic when using nvdiffrast.

@yuzhou164 Hi, could you share your version Cuda and version Nvidia Drive?

Do I have to run the docker environment? What is the command to do that? Sorry, I'm not an expert in docker. If I just build it then try to run the test.py script I still get the egInitialize() failed error

您好,我也遇到了egInitialize() failed的错误,按照上述的办法并未解决,请问有什么好方法吗?
image

我的Linux环境:
cudnn:8.6
cuda:11.3
python:3.7
pytorch:1.9.0

I also encounter this problem when I run the code on our lab server. And that because the server missing some Nvidia drivers. Try to install the missing packages or just reinstall the whole driver.
You can see:
https://github.com/NVlabs/nvdiffrast/issues/51#issuecomment-954275825
https://github.com/NVlabs/nvdiffrast/issues/56#issuecomment-983690874
https://github.com/NVlabs/nvdiffrast/issues/24#issuecomment-824702760

@YuDeng hello, I also encounter [F glutil.cpp:338] eglInitialize() failed when running the inference code. My system environment is nvidia driver 450.80.02 and cuda 11.0. I already have a basic pytorch environment so only install OpenGL related libraries in my system environment (not docker environment) as suggested above. Specifically, I did the following steps:

  1. I organize the contents of the dockerfile (provided by @yuzhou164) into an environment.sh:
apt-get update && apt-get install -y --no-install-recommends \
    pkg-config \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    libglvnd-dev \
    libgl1-mesa-dev \
    libegl1-mesa-dev \
    libgles2-mesa-dev \
    cmake \
    curl \
    libsm6 \
    libxext6 \
    libxrender-dev

# export PYTHONDONTWRITEBYTECODE=1
export PYTHONUNBUFFERED=1

# for GLEW
export LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

# nvidia-container-runtime
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics

# Default pyopengl to EGL for good headless rendering support
export PYOPENGL_PLATFORM=egl

cp docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

pip install --upgrade pip
pip install ninja imageio imageio-ffmpeg
  1. Run this environment.sh:
sudo bash environment.sh
  1. Re-install nvdiffrast:
cd nvdiffrast    # ./Deep3DFaceRecon_pytorch/nvdiffrast
pip install .

I want to know if it is really caused by the nvidia driver version or some other problems.
Looking forward to your reply.

According to NVlabs/nvdiffrast#56 (comment), I successfully resolve this problem by reinstalling the NVIDIA driver without -no-opengl-files.

nvdiffrast.zip Hi, This ZIP package includes a dockerfile with requirement.txt for this Deep3DFaceRecon_pytorch project. The command to build the environment is docker build -f docker/Dockerfile -t name:tagname . One thing needs attention is that the version correspondence between version Cuda and version Nvidia Driver . Please refers the Nvidia website

Hi,May i ask that how to use the command of 'docker'?

According to NVlabs/nvdiffrast#56 (comment), I successfully resolve this problem by reinstalling the NVIDIA driver without -no-opengl-files.

May i ask how to reinstall the NVIDIA driver without -no-opengl-files? Could u provide more details?
Looking forward to your assistance!

According to NVlabs/nvdiffrast#56 (comment), I successfully resolve this problem by reinstalling the NVIDIA driver without -no-opengl-files.

May i ask how to reinstall the NVIDIA driver without -no-opengl-files? Could u provide more details? Looking forward to your assistance!

Hi, I downloaded the corresponding Nvidia Driver from https://www.nvidia.cn/Download/index.aspx?lang=cn and installed it without -no-opengl-files in my device.

According to NVlabs/nvdiffrast#56 (comment), I successfully resolve this problem by reinstalling the NVIDIA driver without -no-opengl-files.

May i ask how to reinstall the NVIDIA driver without -no-opengl-files? Could u provide more details? Looking forward to your assistance!

Hi, I downloaded the corresponding Nvidia Driver from https://www.nvidia.cn/Download/index.aspx?lang=cn and installed it without -no-opengl-files in my device.

Very glad to receive your sincere reply.
EXM, i'm a newcomer on it. Could u explain more about how to "installed it without -no-opengl-files in my device."? Is there some tutorial for reinstalling Nvidia about it?
Thx.

@ChenVoid, this blog (https://blog.csdn.net/weixin_43925119/article/details/109808670) may help you. When installing the Nvidia Driver, I changed the command "sudo ./NVIDIA-Linux-x86_64-430.14.run -no-x-check -no-nouveau-check -no-opengl-files" to "sudo ./NVIDIA-Linux-x86_64-430.14.run".

@Rodger-Huang, thanks for your reply.
By the way, do i need to re-install CUDA, Anaconda or PyTorch after reinstalling the Nvidia driver?

@YuDeng hello, I also encounter [F glutil.cpp:338] eglInitialize() failed when running the inference code. My system environment is nvidia driver 450.80.02 and cuda 11.0. I already have a basic pytorch environment so only install OpenGL related libraries in my system environment (not docker environment) as suggested above. Specifically, I did the following steps:

  1. I organize the contents of the dockerfile (provided by @yuzhou164) into an environment.sh:
apt-get update && apt-get install -y --no-install-recommends \
    pkg-config \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    libglvnd-dev \
    libgl1-mesa-dev \
    libegl1-mesa-dev \
    libgles2-mesa-dev \
    cmake \
    curl \
    libsm6 \
    libxext6 \
    libxrender-dev

# export PYTHONDONTWRITEBYTECODE=1
export PYTHONUNBUFFERED=1

# for GLEW
export LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH

# nvidia-container-runtime
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics

# Default pyopengl to EGL for good headless rendering support
export PYOPENGL_PLATFORM=egl

cp docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

pip install --upgrade pip
pip install ninja imageio imageio-ffmpeg
  1. Run this environment.sh:
sudo bash environment.sh
  1. Re-install nvdiffrast:
cd nvdiffrast    # ./Deep3DFaceRecon_pytorch/nvdiffrast
pip install .

I want to know if it is really caused by the nvidia driver version or some other problems. Looking forward to your reply.

The same problem came across to me. It seems that the use of a low version OpenGL (specifically, Mesa instead of Nvidia) caused this error.

@SunYangtian Could you explain more? The previous solution is not working because I don't have permission to reinstall the nvidia driver.

sicxu commented

Following #108, we have updated the code and README to support CUDA context. For someone having trouble installing OpenGL, you may try to add "--use_opengl False" to the script to switch to using CUDA context.

Solution

Replace all dr.RasterizeGLContext to dr.RasterizeCudaContext

Solution

Replace all dr.RasterizeGLContext to dr.RasterizeCudaContext

I have tried different images using docker, and all failed with the same error. But this works for me. And these codes are located in util/nvdiffrast.py Line 62. I do like this:

if self.ctx is None:
self.ctx = dr.RasterizeCudaContext(device=device)
ctx_str = "cuda"
# if self.use_opengl:
# self.ctx = dr.RasterizeGLContext(device=device)
# ctx_str = "opengl"
# else:
# self.ctx = dr.RasterizeCudaContext(device=device)
# ctx_str = "cuda"
# print("create %s ctx on device cuda:%d"%(ctx_str, device.index))

Solution

Replace all dr.RasterizeGLContext to dr.RasterizeCudaContext

Thank you, confirmed this worked for me too.

[F glutil.cpp:338] eglInitialize() failed
I solved this by
sudo apt-get install libnvidia-gl-xxx-server
xxx is your nvidia driver version, you can find that by nvidia-smi