Fail to run ae_train

Question

Fail to run ae_train

obenysha opened this issue 3 years ago · 2 comments

Hello,
I followed the instructions and links* regarding Headless Rendering but didn't find a solution to this issue.
*e.g. to change PyOpenGL according to mcfletch/pyopengl#27
Could you please take a look?

System Info

GPU model: NVIDIA Tesla T4 Virtual Workstation
Python version: 3.7.12
OpenGL version: pyopengl==3.1.5

Traceback (most recent call last):
  File "/opt/conda/envs/aae_py37_tf26/bin/ae_train", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/ae_train.py", line 91, in main
    dataset.get_training_images(dataset_path, args)
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 92, in get_training_images
    self.render_training_images()
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 244, in render_training_images
    bgr_x, depth_x = self.renderer.render(
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/utils.py", line 15, in decorator
    setattr(self, attribute, function(self))
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 75, in renderer
    float(self._kw['vertex_scale'])
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer_phong.py", line 18, in __init__
    self._context = gu.OffscreenContext()
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/meshrenderer/gl_utils/egl_offscreen_context.py", line 64, in __init__
    assert eglInitialize(self._egl_display, major, minor)
  File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_NOT_INITIALIZED,
        baseOperation = eglInitialize,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f309d7a3c20>,
                c_long(0),
                c_long(0),
        ),
        result = 0
)

Thanks,
Omer

Answer 1 · 2022-03-14T10:45:53.000Z

Maybe it's worth to mention the following issue when I create the VM:

Welcome to the Google Deep Learning VM
======================================

Version: common-cu113.m90
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-18-cloud-amd64 x86_64\n)

Resources:
 * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
 * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
 * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform

To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
Linux instance-4 4.19.0-18-cloud-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

This VM requires Nvidia drivers to function correctly.   Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver.
wait apt locks released
install linux headers: linux-headers-4.19.0-18-cloud-amd64
**E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?**
Nvidia driver installed.

Answer 2 · 2022-03-14T12:33:36.000Z

Hi @obenysha ,

I haven't tried to run it on a Google Deep Learning VM. Maybe first check whether they can actually run an EGL context for offscreen rendering.