pygfx/wgpu-py

Document requirements to run in cloud/remote environments

kushalkolar opened this issue · 7 comments

I think it would be useful to document what libs are required (and any other requirements) for running wgpu in cloud instances where the GPUs have no physical display output.

This is only for Linux systems, I'll focus on Debian/Ubuntu based distros for now.

EDIT: I think I found the minimal requirements to make this work, if any of these are missing the Vulkan adapter will not show up.

xserver-xorg-core
mesa-vulkan-drivers
libvulkan1

Do you think it would be useful to have a section in the docs for running on cloud environments, maybe in platform requirements ?

Kind of related: #482

Tested that this allows pygfx etc. to run on https://codeocean.com/ and https://lambdalabs.com/service/gpu-cloud . I will test others when I have time.

adapter infos:

Available adapters:
{'adapter_type': 'DiscreteGPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': '525.147.05',
 'device': 'Tesla T4',
 'device_id': 7864,
 'vendor': 'NVIDIA',
 'vendor_id': 4318}
{'adapter_type': 'CPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': 'Mesa 23.2.1-1ubuntu3.1~22.04.2 (LLVM 15.0.7)',
 'device': 'llvmpipe (LLVM 15.0.7, 256 bits)',
 'device_id': 0,
 'vendor': 'llvmpipe',
 'vendor_id': 65541}
{'adapter_type': 'Unknown',
 'architecture': '',
 'backend_type': 'OpenGL',
 'description': '',
 'device': 'Tesla T4/PCIe/SSE2',
 'device_id': 0,
 'vendor': '',
 'vendor_id': 4318}
output from `print_wgpu_report()`

██ system:

             platform:  Linux-5.4.0-1103-aws-x86_64-with-glibc2.35
python_implementation:  CPython
               python:  3.10.12

██ versions:

       wgpu:  0.15.1
       cffi:  1.15.1
jupyter_rfb:  0.4.2
      numpy:  1.26.2
      pygfx:  0.2.0
   pylinalg:  0.4.1

██ wgpu_native_info:

expected_version:  0.19.3.1
     lib_version:  0.19.3.1
        lib_path:  libwgpu_native.so

██ object_counts:

                      count  resource_mem

            Adapter:      4
          BindGroup:      4
    BindGroupLayout:      4
             Buffer:     12         1.94K
      CanvasContext:      1
      CommandBuffer:      0
     CommandEncoder:      0
 ComputePassEncoder:      0
    ComputePipeline:      0
             Device:      1
     PipelineLayout:      0
           QuerySet:      0
              Queue:      1
       RenderBundle:      0
RenderBundleEncoder:      0
  RenderPassEncoder:      0
     RenderPipeline:      3
            Sampler:      1
       ShaderModule:      3
            Texture:      4         13.7M
        TextureView:      5

              total:     43         13.7M

██ wgpu_native_counts:

                  count    mem  backend    a   k  r  e  el_size

        Adapter:      4  6.25K   vulkan:   3   3  0  0    1.98K
                                     gl:   1   1  0  0      304
      BindGroup:      4  1.47K   vulkan:   4   4  0  0      368
                                     gl:   0   0  0  0      304
BindGroupLayout:      4  1.28K   vulkan:   6   4  2  0      320
                                     gl:   0   0  0  0      232
         Buffer:     12  3.55K   vulkan:  13  12  1  0      296
                                     gl:   0   0  0  0      240
  CanvasContext:      0      0             0   0  0  0      160
  CommandBuffer:      1  1.28K   vulkan:   0   0  0  1    1.28K
                                     gl:   0   0  0  0    9.42K
ComputePipeline:      0      0   vulkan:   0   0  0  0      288
                                     gl:   0   0  0  0      280
         Device:      1  11.9K   vulkan:   1   1  0  0    11.9K
                                     gl:   0   0  0  0    10.9K
 PipelineLayout:      0      0   vulkan:   3   0  3  0      200
                                     gl:   0   0  0  0      216
       QuerySet:      0      0   vulkan:   0   0  0  0       80
                                     gl:   0   0  0  0       88
          Queue:      1    184   vulkan:   1   1  0  0      184
                                     gl:   0   0  0  0      136
   RenderBundle:      0      0   vulkan:   0   0  0  0      848
                                     gl:   0   0  0  0      848
 RenderPipeline:      3  1.68K   vulkan:   3   3  0  0      560
                                     gl:   0   0  0  0      712
        Sampler:      1     80   vulkan:   1   1  0  0       80
                                     gl:   0   0  0  0       64
   ShaderModule:      3  2.40K   vulkan:   3   3  0  0      800
                                     gl:   0   0  0  0      824
        Texture:      4  3.29K   vulkan:   5   4  1  0      824
                                     gl:   0   0  0  0      712
    TextureView:      5  1.24K   vulkan:   5   5  1  0      248
                                     gl:   0   0  0  0      216

          total:     43  34.6K

    * The a, k, r, e are allocated, kept, released, and error, respectively.
    * Reported memory does not include buffer/texture data.

██ pygfx_adapter_info:

      vendor:  NVIDIA
architecture:
      device:  Tesla T4
 description:  525.147.05
   vendor_id:  4.31K
   device_id:  7.86K
adapter_type:  DiscreteGPU
backend_type:  Vulkan

██ pygfx_features:

                                       adapter  device

                  bgra8unorm-storage:        ✓       -
               depth32float-stencil8:        ✓       -
                  depth-clip-control:        ✓       -
                  float32-filterable:        ✓       ✓
             indirect-first-instance:        ✓       -
            rg11b10ufloat-renderable:        ✓       -
                          shader-f16:        ✓       -
            texture-compression-astc:        -       -
              texture-compression-bc:        ✓       -
            texture-compression-etc2:        -       -
                     timestamp-query:        ✓       -
                   MultiDrawIndirect:        ✓       -
              MultiDrawIndirectCount:        ✓       -
                       PushConstants:        ✓       -
TextureAdapterSpecificFormatFeatures:        ✓       -
               VertexWritableStorage:        ✓       -

██ pygfx_limits:

                                                  adapter  device

                                max_bind_groups:        8       8
            max_bind_groups_plus_vertex_buffers:        0       0
                    max_bindings_per_bind_group:    1.00K   1.00K
                                max_buffer_size:  18.4E18    268M
          max_color_attachment_bytes_per_sample:        0       0
                          max_color_attachments:        0       0
          max_compute_invocations_per_workgroup:    1.02K   1.02K
                   max_compute_workgroup_size_x:    1.02K   1.02K
                   max_compute_workgroup_size_y:    1.02K   1.02K
                   max_compute_workgroup_size_z:       64      64
             max_compute_workgroup_storage_size:    49.1K   49.1K
           max_compute_workgroups_per_dimension:    65.5K   65.5K
max_dynamic_storage_buffers_per_pipeline_layout:       16      16
max_dynamic_uniform_buffers_per_pipeline_layout:       15      15
              max_inter_stage_shader_components:      128     128
               max_inter_stage_shader_variables:        0       0
          max_sampled_textures_per_shader_stage:    1.04M   1.04M
                  max_samplers_per_shader_stage:    1.04M   1.04M
                max_storage_buffer_binding_size:    2.14G   2.14G
           max_storage_buffers_per_shader_stage:    1.04M   1.04M
          max_storage_textures_per_shader_stage:    1.04M   1.04M
                       max_texture_array_layers:    2.04K   2.04K
                        max_texture_dimension1d:    32.7K   32.7K
                        max_texture_dimension2d:    32.7K   32.7K
                        max_texture_dimension3d:    16.3K   16.3K
                max_uniform_buffer_binding_size:    65.5K   65.5K
           max_uniform_buffers_per_shader_stage:    1.04M   1.04M
                          max_vertex_attributes:       32      32
                 max_vertex_buffer_array_stride:    2.04K   2.04K
                             max_vertex_buffers:       16      16
            min_storage_buffer_offset_alignment:       32      32
            min_uniform_buffer_offset_alignment:       64      64

██ pygfx_caches:

                    count  hits  misses

full_quad_objects:      1     0       2
 mipmap_pipelines:      0     0       0
          layouts:      2     0       4
         bindings:      2     0       2
   shader_modules:      2     0       2
        pipelines:      2     0       2
 shadow_pipelines:      0     0       0

██ pygfx_resources:

Texture:  6
 Buffer:  19
Available adapters on a lambdalabs instance:
{'adapter_type': 'DiscreteGPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': '535.129.03',
 'device': 'Quadro RTX 6000',
 'device_id': 7728,
 'vendor': 'NVIDIA',
 'vendor_id': 4318}
{'adapter_type': 'CPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': 'Mesa 23.2.1-1ubuntu3.1~22.04.2 (LLVM 15.0.7)',
 'device': 'llvmpipe (LLVM 15.0.7, 256 bits)',
 'device_id': 0,
 'vendor': 'llvmpipe',
 'vendor_id': 65541}
{'adapter_type': 'Unknown',
 'architecture': '',
 'backend_type': 'OpenGL',
 'description': '',
 'device': 'Quadro RTX 6000/PCIe/SSE2',
 'device_id': 0,
 'vendor': '',
 'vendor_id': 4318}

Performance is really good!

code_ocean.mp4

This is enough for software rendering:

sudo apt install -y libegl1-mesa libgl1-mesa-dri libxcb-xfixes0-dev mesa-vulkan-drivers

I don't know about using GPUs on hosted environments... Usually they're locked down or containerized and it requires setup steps specific to the hosting environment.

I don't know about using GPUs on hosted environments... Usually they're locked down or containerized and it requires setup steps specific to the hosting environment.

They usually come pre-loaded with nvidia drivers and CUDA libs and we have got it to work in containers. I'll see if the same 3 dependencies are enough on a few other major providers and that should be enough guidance for many users.

Do you think it would be useful to have a section in the docs for running on cloud environments, maybe in platform requirements ?

I think a section on that page makes sense. I think the "platform-requirements" should focus on desktop/local usage. So a new h2 after it for "Cloud compute", with two subheadings, one for "with GPU" and one "software rendering", which should be what is now "Installing LavaPipe on Linux".

I can confirm that with xserver-xorg-core, mesa-vulkan-drivers, and libvulkan1 installed, fastplotlib is now working in the Allen Institute's Code Ocean environment. Thanks for figuring this out!

I can confirm that with xserver-xorg-core, mesa-vulkan-drivers, and libvulkan1 installed, fastplotlib is now working in the Allen Institute's Code Ocean environment. Thanks for figuring this out!

I recommend checking that the hardware vulkan adapter is at the top to make sure you're not using lavapipe (software rendering):

import wgpu
import pprint

for a in wgpu.gpu.enumerate_adapters():
    pprint.pprint(a.request_adapter_info())

Should get something like this:

{'adapter_type': 'DiscreteGPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': '525.147.05',
 'device': 'Tesla T4',
 'device_id': 7864,
 'vendor': 'NVIDIA',
 'vendor_id': 4318}
{'adapter_type': 'CPU',
 'architecture': '',
 'backend_type': 'Vulkan',
 'description': 'Mesa 23.2.1-1ubuntu3.1~22.04.2 (LLVM 15.0.7)',
 'device': 'llvmpipe (LLVM 15.0.7, 256 bits)',
 'device_id': 0,
 'vendor': 'llvmpipe',
 'vendor_id': 65541}
{'adapter_type': 'Unknown',
 'architecture': '',
 'backend_type': 'OpenGL',
 'description': '',
 'device': 'Tesla T4/PCIe/SSE2',
 'device_id': 0,
 'vendor': '',
 'vendor_id': 4318}

(next release of fastplotlib and current fastplotlib@main will also display all adapters and indicate the default adapter when you import)

Update: Works very well on codeocean and lambdalabs, high performance with jupyter-rfb. Also works on google cloud but the rfb performance makes it unusable.
Couldn't get it working on AWS SageMaker, I even tried installing kde-plasma-desktop. Also tried installing wgpu from pip and conda. nvidia-smi worked so the nvidia drivers were installed, IDK ¯\_(ツ)_/¯ .

I'll add general guidance to the docs that you need <...> system packages installed, but your mileage may vary.

@jsiegle I forgot to mention, you'd want these apt packages as well which makes a huge difference in the rfb performance:

libjpeg-turbo8-dev libturbojpeg0-dev

And simplejpeg via pip.