aclysma/rafx

crash when running demo in Linux

Closed this issue · 16 comments

I'm getting a crash when trying to run the demo in Linux with the vulkan backend. This is the backtrace:

thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 3', rafx-plugins/src/pipelines/modern/graph_generator/mod.rs:254:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:107:14
   2: core::panicking::panic_bounds_check
             at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/panicking.rs:75:5
   3: rafx_plugins::pipelines::modern::graph_generator::generate_render_graph
   4: <rafx_plugins::pipelines::modern::plugin::ModernPipelineRendererPlugin as rafx_renderer::renderer_pipeline_plugin::RendererPipelinePlugin>::generate_render_graph
   5: rafx_renderer::renderer::Renderer::start_rendering_next_frame
   6: demo::DemoApp::update
   7: demo::update_loop::{{closure}}
   8: winit::platform_impl::platform::x11::EventLoop<T>::run
   9: winit::platform_impl::platform::EventLoop<T>::run
  10: winit::event_loop::EventLoop<T>::run
  11: demo::update_loop
  12: demo::main_native::main_native
  13: demo::main

I'm running an up-to-date Manjaro Linux environment with the Budgie DE, and I'm running an AMD RX570 GPU.

I wasn't able to reproduce a crash but I did trigger some validation errors in vulkan/windows 10 when I switched to a 2d scene. I probably introduced this recently. In case this is something different, if you're willing to help find where this is coming from, could you capture a more detailed stack by using the environment args RUST_BACKTRACE=full in debug mode? The odd thing about this stack is that I wouldn't expect this logic to differ across platforms or graphics APIs.

Also, could you confirm if this happened immediately on launch or if you had to change scenes (arrow keys.)

(While I know people have run rafx successfully in linux, I personally only use macOS/windows, so I may have a hard time helping much with linux issues)

Okay, here is the output of running in debug mode with RUST_BACKTRACE=full:

[2022-02-02T16:42:54.461768520Z INFO  winit::platform_impl::platform::x11::window] Guessed window scale factor: 1
[2022-02-02T16:42:54.461794449Z DEBUG winit::platform_impl::platform::x11::window] Calculated physical dimensions: 1600x900
[2022-02-02T16:42:54.462283810Z DEBUG demo] calling init
[2022-02-02T16:42:54.462910590Z INFO  rafx_api::backends::vulkan::api] Validation mode: EnabledIfAvailable
[2022-02-02T16:42:54.462922031Z INFO  rafx_api::backends::vulkan::api] Link method for vulkan: Dynamic
[2022-02-02T16:42:54.465197875Z INFO  rafx_api::backends::vulkan::internal::instance] Found Vulkan version: (1, 2, 202)
[2022-02-02T16:42:54.466522719Z DEBUG rafx_api::backends::vulkan::internal::instance] Available Layers: [
    LayerProperties {
        layer_name: "VK_LAYER_VALVE_steam_overlay_64",
        spec_version: 4202632,
        implementation_version: 1,
        description: "Steam Overlay Layer",
    },
    LayerProperties {
        layer_name: "VK_LAYER_VALVE_steam_overlay_32",
        spec_version: 4202632,
        implementation_version: 1,
        description: "Steam Overlay Layer",
    },
    LayerProperties {
        layer_name: "VK_LAYER_VALVE_steam_fossilize_32",
        spec_version: 4202632,
        implementation_version: 1,
        description: "Steam Pipeline Caching Layer",
    },
    LayerProperties {
        layer_name: "VK_LAYER_VALVE_steam_fossilize_64",
        spec_version: 4202632,
        implementation_version: 1,
        description: "Steam Pipeline Caching Layer",
    },
]
[2022-02-02T16:42:54.490468782Z DEBUG rafx_api::backends::vulkan::internal::instance] Available Extensions: [
    ExtensionProperties {
        extension_name: "VK_KHR_device_group_creation",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_display",
        spec_version: 23,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_external_fence_capabilities",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_external_memory_capabilities",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_external_semaphore_capabilities",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_get_display_properties2",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_get_physical_device_properties2",
        spec_version: 2,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_get_surface_capabilities2",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_surface",
        spec_version: 25,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_surface_protected_capabilities",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_wayland_surface",
        spec_version: 6,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_xcb_surface",
        spec_version: 6,
    },
    ExtensionProperties {
        extension_name: "VK_KHR_xlib_surface",
        spec_version: 6,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_acquire_drm_display",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_acquire_xlib_display",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_debug_report",
        spec_version: 10,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_direct_mode_display",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_display_surface_counter",
        spec_version: 1,
    },
    ExtensionProperties {
        extension_name: "VK_EXT_debug_utils",
        spec_version: 2,
    },
]
[2022-02-02T16:42:54.490548462Z WARN  rafx_api::backends::vulkan::internal::instance] Could not find an appropriate validation layer. Check that the vulkan SDK has been installed or disable validation.
[2022-02-02T16:42:54.490556768Z DEBUG rafx_api::backends::vulkan::internal::instance] Using layers: []
[2022-02-02T16:42:54.490561827Z DEBUG rafx_api::backends::vulkan::internal::instance] Using extensions: ["VK_KHR_surface", "VK_KHR_xlib_surface"]
[2022-02-02T16:42:54.490580332Z INFO  rafx_api::backends::vulkan::internal::instance] Creating vulkan instance
[2022-02-02T16:42:54.494707761Z INFO  rafx_api::backends::vulkan::internal::instance] Seting up vulkan debug callback
thread 'main' panicked at 'Unable to load create_debug_utils_messenger_ext', /home/billydm/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/vk/extensions.rs:10475:21
stack backtrace:
   0:     0x563710b8d54c - std::backtrace_rs::backtrace::libunwind::trace::h093d4af0eabdfc15
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x563710b8d54c - std::backtrace_rs::backtrace::trace_unsynchronized::h2b90813d74c759ca
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x563710b8d54c - std::sys_common::backtrace::_print_fmt::hfaa8856bf3eca13f
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x563710b8d54c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0cbaef3adcb5a454
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x563710bb1a3c - core::fmt::write::h35a8eb836b847360
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/fmt/mod.rs:1149:17
   5:     0x563710b854d5 - std::io::Write::write_fmt::h45f2b8390f189782
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/io/mod.rs:1697:15
   6:     0x563710b8f170 - std::sys_common::backtrace::_print::h56f62073b0e62985
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x563710b8f170 - std::sys_common::backtrace::print::h152fba05ec38941b
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x563710b8f170 - std::panicking::default_hook::{{closure}}::ha3121a0b8482251f
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:211:50
   9:     0x563710b8ed25 - std::panicking::default_hook::hde5d78c11ae3b8f6
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:228:9
  10:     0x563710b8f824 - std::panicking::rust_panic_with_hook::he6f55c3e7ed1777c
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:606:17
  11:     0x5637109e85c4 - std::panicking::begin_panic::{{closure}}::hb185d96aabcf7239
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:526:9
  12:     0x5637109e859c - std::sys_common::backtrace::__rust_end_short_backtrace::h595d76edcc38be45
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:139:18
  13:     0x56370eec5d5c - std::panicking::begin_panic::h099a7be50e1d537f
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:525:12
  14:     0x5637109e7699 - ash::vk::extensions::ExtDebugUtilsFn::load::create_debug_utils_messenger_ext::h9469d6eaf2c3921a
                               at /home/billydm/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/vk/extensions.rs:10475:21
  15:     0x5637109efae5 - ash::vk::extensions::ExtDebugUtilsFn::create_debug_utils_messenger_ext::heda798023487e7fb
                               at /home/billydm/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/vk/extensions.rs:10599:9
  16:     0x5637109efae5 - ash::extensions::ext::debug_utils::DebugUtils::create_debug_utils_messenger::hb39fbe349cee055c
                               at /home/billydm/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.32.1/src/extensions/ext/debug_utils.rs:109:9
  17:     0x5637108acef4 - rafx_api::backends::vulkan::internal::instance::VkInstance::setup_vulkan_debug_callback::hb91bb9864337c973
                               at /home/billydm/Downloads/rafx/rafx-api/src/backends/vulkan/internal/instance.rs:237:22
  18:     0x56371088ff68 - rafx_api::backends::vulkan::internal::instance::VkInstance::new::hd731a1de8babb276
                               at /home/billydm/Downloads/rafx/rafx-api/src/backends/vulkan/internal/instance.rs:175:18
  19:     0x56371079b522 - rafx_api::backends::vulkan::api::RafxApiVulkan::new::hc674850bec6d402d
                               at /home/billydm/Downloads/rafx/rafx-api/src/backends/vulkan/api.rs:118:24
  20:     0x563710891fcb - rafx_api::api::RafxApi::new_vulkan::h621b36d81dbd0cdd
                               at /home/billydm/Downloads/rafx/rafx-api/src/api.rs:102:24
  21:     0x563710891f0c - rafx_api::api::RafxApi::new::hc3bd88469f287054
                               at /home/billydm/Downloads/rafx/rafx-api/src/api.rs:74:20
  22:     0x56370f0e681f - demo::init::rendering_init::h0d26a748f300d21e
                               at /home/billydm/Downloads/rafx/demo/src/init.rs:100:29
  23:     0x56370f005882 - demo::DemoApp::init::h41cbadeed0b1d634
                               at /home/billydm/Downloads/rafx/demo/src/lib.rs:162:9
  24:     0x56370f00a1e4 - demo::update_loop::h9e1ac4812faa6796
                               at /home/billydm/Downloads/rafx/demo/src/lib.rs:574:19
  25:     0x56370ef54452 - demo::main_native::main_native::hd11678f5e171c606
                               at /home/billydm/Downloads/rafx/demo/src/main_native.rs:13:5
  26:     0x56370eed63e4 - demo::main::hc519d3f5e09cc4d3
                               at /home/billydm/Downloads/rafx/demo/src/main.rs:51:5
  27:     0x56370eed858b - core::ops::function::FnOnce::call_once::h4078af3797fcf5eb
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ops/function.rs:227:5
  28:     0x56370eed6e7e - std::sys_common::backtrace::__rust_begin_short_backtrace::hfb7c49150fec602e
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/sys_common/backtrace.rs:123:18
  29:     0x56370eedbed1 - std::rt::lang_start::{{closure}}::hb518f1b3c2d821e1
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/rt.rs:145:18
  30:     0x563710b8d1db - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h7422298f811ee14d
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ops/function.rs:259:13
  31:     0x563710b8d1db - std::panicking::try::do_call::hcba55cf6d5b5533e
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:406:40
  32:     0x563710b8d1db - std::panicking::try::h0b2a05128a4ee609
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:370:19
  33:     0x563710b8d1db - std::panic::catch_unwind::he1deef49e02fb06c
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panic.rs:133:14
  34:     0x563710b8d1db - std::rt::lang_start_internal::{{closure}}::hf44e73ef18e45ffd
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/rt.rs:128:48
  35:     0x563710b8d1db - std::panicking::try::do_call::h894daf8a782b48f4
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:406:40
  36:     0x563710b8d1db - std::panicking::try::hd3e4f8d86f3a7fb5
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panicking.rs:370:19
  37:     0x563710b8d1db - std::panic::catch_unwind::h2e69404746fb3d50
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/panic.rs:133:14
  38:     0x563710b8d1db - std::rt::lang_start_internal::hec7f1b06f38d8409
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/rt.rs:128:20
  39:     0x56370eedbea0 - std::rt::lang_start::h91ddb3686b37ff09
                               at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/std/src/rt.rs:144:17
  40:     0x56370eed644c - main
  41:     0x7f3a4af59b25 - __libc_start_main
  42:     0x56370eed4c6e - _start
  43:                0x0 - <unknown>

Ah, so apparently I'm getting the same error as #180

That's a different crash, it means the vulkan SDK is not installed, which means validation layers won't be found. Can look at this more later.

Okay, after installing the vulkan-validation-layers package I get this rather long output:
https://pastebin.com/dHLqcs6D

The first error appears to happen on line 1546

Thanks for the log, I see exactly what's going on now:

  1. There is a validation error "groupCountX (792580) exceeds device limit maxComputeWorkGroupCount[0] (65535). The Vulkan spec states: groupCountX must be less than or equal to VkPhysicalDeviceLimits::maxComputeWorkGroupCount[0]". As the code is written now, it requires support for larger compute work group counts than your driver/GPU supports. (the modern pipeline is intended for recent, mid-high spec discrete GPUs that are somewhat equivalent to a PS5 that can do bindless/GPU-driven rendering). Maybe this can be improved on my end, but maybe not. I need to look into it more. At minimum, there needs to be a clear error message for this.
  2. Following this we take an error path with a trivially fixable problem. I have a fix for this, but it won't resolve the first issue.

Will look more later. Thanks for posting the additional logging. For now you may be able to run the basic pipeline. Change demo/Cargo.toml to comment out the default feature "modern-pipeline" and uncomment "basic-pipeline".

(Actually looks like this is more of an AMD limit than "new vs. old", so I'm almost certain there is a simple workaround for this, like executing workgroups as a 2D array rather than a 1D array)

https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount[0]&platform=all

Using the basic-pipeline isn't working either. It's not crashing this time but it is stuck on a black screen.

Here is the output using the basic-pipeline:
https://pastebin.com/rWXgWr6p

I should have mentioned, changing pipelines requires re-importing assets. Could you delete /demo/.assets_db and try again?

Okay, well this is interesting.

After deleting /demo/.assets_db, it appears to run while in debug mode, albeit while constantly spamming this error in the console:

[2022-02-02T22:59:31.769397066Z ERROR rafx_api::backends::vulkan::internal::debug_reporter] "Validation Error: [ VUID-vkFreeCommandBuffers-pCommandBuffers-00047 ] Object 0: handle = 0x7bc42805fc10, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x1ab902fc | Attempt to free VkCommandBuffer 0x7bc42805fc10[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)"
[2022-02-02T22:59:31.796223363Z ERROR rafx_api::backends::vulkan::internal::debug_reporter] "Validation Error: [ VUID-vkFreeCommandBuffers-pCommandBuffers-00047 ] Object 0: handle = 0x7bc42804b210, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x1ab902fc | Attempt to free VkCommandBuffer 0x7bc42804b210[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)"
[2022-02-02T22:59:31.824304190Z ERROR rafx_api::backends::vulkan::internal::debug_reporter] "Validation Error: [ VUID-vkFreeCommandBuffers-pCommandBuffers-00047 ] Object 0: handle = 0x7bc4281cf8f0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x1ab902fc | Attempt to free VkCommandBuffer 0x7bc4281cf8f0[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)"

But then when I attempt to run in in release mode, the entire screen becomes corrupted and I need to restart my computer to fix it:
IMG_20220202_165654

This is the output of running in debug mode after deleting /demo/.assets_db:
https://pastebin.com/FTB7wMZz

Ha, amazing. I have an AMD card I can try running on, maybe this will repro in windows.

Pushed a couple fixes that may help with running the modern pipeline. I'm still a little concerned about some validation errors in your logs that seem to be related to texture upload that I haven't seen before. But the group count issue was really great to find/fix, thanks for helping find that!

If you want to retry, I'd be curious if you get further on the modern pipeline (don't forget to delete .assets_db if you retry).

Hmm, it's still not working (this is the log using the modern pipeline). https://pastebin.com/Y3nGhF6v

I may be encountering a similar issue where rotating_frame_index is 3 but static_resources.tonemap_debug_output (and static_resources.mesh_culling_debug_output) contains only 3 values. This is when running demo with the vulkan backend.

thread 'main' panicked at rafx-plugins/src/pipelines/modern/graph_generator/mod.rs:265:46:
index out of bounds: the len is 3 but the index is 3

adding a modulus operator %3 allows the demo to run correctly. (great demos btw)

I'm hoping what I committed on June 28th fixed this. Feel free to re-open if not!