Fail to run ae_train
obenysha opened this issue · 2 comments
obenysha commented
Hello,
I followed the instructions and links* regarding Headless Rendering but didn't find a solution to this issue.
*e.g. to change PyOpenGL according to mcfletch/pyopengl#27
Could you please take a look?
System Info
GPU model: NVIDIA Tesla T4 Virtual Workstation
Python version: 3.7.12
OpenGL version: pyopengl==3.1.5
Traceback (most recent call last):
File "/opt/conda/envs/aae_py37_tf26/bin/ae_train", line 8, in <module>
sys.exit(main())
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/ae_train.py", line 91, in main
dataset.get_training_images(dataset_path, args)
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 92, in get_training_images
self.render_training_images()
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 244, in render_training_images
bgr_x, depth_x = self.renderer.render(
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/utils.py", line 15, in decorator
setattr(self, attribute, function(self))
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/ae/dataset.py", line 75, in renderer
float(self._kw['vertex_scale'])
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/meshrenderer/meshrenderer_phong.py", line 18, in __init__
self._context = gu.OffscreenContext()
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/auto_pose/meshrenderer/gl_utils/egl_offscreen_context.py", line 64, in __init__
assert eglInitialize(self._egl_display, major, minor)
File "/opt/conda/envs/aae_py37_tf26/lib/python3.7/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
return self( *args, **named )
File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError
OpenGL.raw.EGL._errors.EGLError: EGLError(
err = EGL_NOT_INITIALIZED,
baseOperation = eglInitialize,
cArguments = (
<OpenGL._opaque.EGLDisplay_pointer object at 0x7f309d7a3c20>,
c_long(0),
c_long(0),
),
result = 0
)
Thanks,
Omer
obenysha commented
Maybe it's worth to mention the following issue when I create the VM:
Welcome to the Google Deep Learning VM
======================================
Version: common-cu113.m90
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-18-cloud-amd64 x86_64\n)
Resources:
* Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
* Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
* Google Group: https://groups.google.com/forum/#!forum/google-dl-platform
To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
Linux instance-4 4.19.0-18-cloud-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
This VM requires Nvidia drivers to function correctly. Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver.
wait apt locks released
install linux headers: linux-headers-4.19.0-18-cloud-amd64
**E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?**
Nvidia driver installed.
MartinSmeyer commented
Hi @obenysha ,
I haven't tried to run it on a Google Deep Learning VM. Maybe first check whether they can actually run an EGL context for offscreen rendering.