gfx-rs/wgpu-rs

Segmentation fault in safe code when sharing a `Device` across threads

tasgon opened this issue · 6 comments

This small snippet below reliably creates a segmentation fault without the use of unsafe:

fn main() {
    let (_tx, rx) = std::sync::mpsc::channel::<()>();
    let instance = wgpu::Instance::new(wgpu::BackendBit::VULKAN);

    let adapter_options = wgpu::RequestAdapterOptions {
        power_preference: wgpu::PowerPreference::default(),
        compatible_surface: None,
    };

    let adapter = pollster::block_on(async { instance.request_adapter(&adapter_options).await.unwrap() });

    let (device, _queue) = pollster::block_on(async {
        adapter
            .request_device(&wgpu::DeviceDescriptor::default(), None)
            .await
            .unwrap()
    });
    let device = std::sync::Arc::new(device);

    let dev = std::sync::Arc::clone(&device);
    let _handle = std::thread::spawn(move || {
        let _block_until_tx_drop = rx.recv();
        println!("{:?}", dev);
    });
}

The full reproducible example can be found here, which was extracted from my repository here. Wrapping dev in a ManuallyDrop before passing it to the thread resolves the problem, so I suspect it has something to do with how the device is being dropped. The bug has been reproduced on two different Linux machines:

  • Arch Linux, GTX 1070, nvidia 460.67
  • Ubuntu 18.04, GTX 1050, nvidia 460.67

On my machine (Arch), the bug occurs when compiling in debug and release mode. Another person has tested the snippet on these platforms and was unable to reproduce, but undefined behavior may still be happening:

  • Windows 10, Radeon Pro 560X, driver version 26.20.13001.33012
  • Arch Linux, Intel UHD Graphics 620, mesa 20.3.4-3
kvark commented

Thank you for filing this!
I'm unable to reproduce on AMD/Linux/Vulkan.
Could you provide a stack trace, please?

#0  0x00007ffff627f670 in ?? () from /usr/lib/libnvidia-glcore.so.460.67
#1  0x00007ffff627f77c in ?? () from /usr/lib/libnvidia-glcore.so.460.67
#2  0x00007ffff627f832 in ?? () from /usr/lib/libnvidia-glcore.so.460.67
#3  0x00007ffff765b59f in ?? () from /usr/lib/libGLX_nvidia.so.0
#4  0x00007ffff767432f in ?? () from /usr/lib/libGLX_nvidia.so.0
#5  0x00007ffff79f1672 in ?? () from /usr/lib/libvulkan.so.1
#6  0x00007ffff79fc8a8 in vkDestroyDevice () from /usr/lib/libvulkan.so.1
#7  0x0000555555a367b5 in ash::vk::features::DeviceFnV1_0::destroy_device (self=0x555555fab458, device=..., p_allocator=0x0) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.31.0/src/vk/features.rs:4631
#8  0x0000555555a0b863 in ash::device::DeviceV1_0::destroy_device<ash::device::Device> (self=0x555555fab450, allocation_callbacks=...) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.31.0/src/device.rs:414
#9  0x00005555559e953d in gfx_backend_vulkan::{{impl}}::drop (self=0x555555fab450) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-vulkan-0.7.0/src/lib.rs:1439
#10 0x0000555555a1cfe6 in core::ptr::drop_in_place<gfx_backend_vulkan::RawDevice> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#11 0x0000555555a13bd4 in alloc::sync::Arc<gfx_backend_vulkan::RawDevice>::drop_slow<gfx_backend_vulkan::RawDevice> (self=0x555555fab968)
    at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1039
#12 0x0000555555a1421f in alloc::sync::{{impl}}::drop<gfx_backend_vulkan::RawDevice> (self=0x555555fab968) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1571
#13 0x0000555555a1d96f in core::ptr::drop_in_place<alloc::sync::Arc<gfx_backend_vulkan::RawDevice>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#14 0x0000555555a1d1c1 in core::ptr::drop_in_place<gfx_backend_vulkan::CommandQueue> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#15 0x0000555555a1d553 in core::ptr::drop_in_place<[gfx_backend_vulkan::CommandQueue]> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#16 0x00005555559ee6e5 in alloc::vec::{{impl}}::drop<gfx_backend_vulkan::CommandQueue,alloc::alloc::Global> (self=0x7ffff3fdff00) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2459
#17 0x0000555555a1dab6 in core::ptr::drop_in_place<alloc::vec::Vec<gfx_backend_vulkan::CommandQueue, alloc::alloc::Global>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#18 0x0000555555a1e096 in core::ptr::drop_in_place<gfx_hal::queue::family::QueueGroup<gfx_backend_vulkan::Backend>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#19 0x000055555585fc81 in wgpu_core::device::Device<gfx_backend_vulkan::Backend>::dispose<gfx_backend_vulkan::Backend> (self=...) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.7.1/src/device/mod.rs:2367
#20 0x0000555555807ab6 in wgpu_core::hub::Hub<gfx_backend_vulkan::Backend, wgpu_core::hub::IdentityManagerFactory>::clear<gfx_backend_vulkan::Backend,wgpu_core::hub::IdentityManagerFactory> (self=0x555555d93058, surface_guard=0x555555d93028, 
    with_adapters=true) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.7.1/src/hub.rs:726
#21 0x00005555557f8dd1 in wgpu_core::hub::{{impl}}::drop<wgpu_core::hub::IdentityManagerFactory> (self=0x555555d92f30) at /home/<user>/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.7.1/src/hub.rs:800
#22 0x00005555558f2826 in core::ptr::drop_in_place<wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#23 0x00005555558eb485 in core::ptr::drop_in_place<wgpu::backend::direct::Context> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#24 0x00005555558909a4 in alloc::sync::Arc<wgpu::backend::direct::Context>::drop_slow<wgpu::backend::direct::Context> (self=0x555555d93e70)
    at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1039
#25 0x000055555589156f in alloc::sync::{{impl}}::drop<wgpu::backend::direct::Context> (self=0x555555d93e70) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1571
#26 0x00005555558efa4f in core::ptr::drop_in_place<alloc::sync::Arc<wgpu::backend::direct::Context>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#27 0x00005555557b8b3d in core::ptr::drop_in_place<wgpu::Device> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#28 0x00005555557b7782 in alloc::sync::Arc<wgpu::Device>::drop_slow<wgpu::Device> (self=0x7ffff3fe39a8) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1039
#29 0x00005555557b997d in alloc::sync::{{impl}}::drop<wgpu::Device> (self=0x7ffff3fe39a8) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1571
#30 0x00005555557b8c5e in core::ptr::drop_in_place<alloc::sync::Arc<wgpu::Device>> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#31 0x00005555557b8edb in core::ptr::drop_in_place<closure-2> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:179
#32 0x00005555557c7fd6 in wgpu_bug::main::{{closure}} () at /home/<user>/Code/Rust/wgpu_bug/src/main.rs:24
#33 0x00005555557c5f58 in std::sys_common::backtrace::__rust_begin_short_backtrace<closure-2,()> (f=...) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
#34 0x00005555557c7325 in std::thread::{{impl}}::spawn_unchecked::{{closure}}::{{closure}}<closure-2,()> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:474
#35 0x00005555557bcea8 in std::panic::{{impl}}::call_once<(),closure-0> (self=..., _args=()) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:344
#36 0x00005555557c05e2 in std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure-0>,()> (data=0x7ffff3fe3af8) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379
#37 0x00005555557c069d in __rust_try ()
#38 0x00005555557c051d in std::panicking::try<(),std::panic::AssertUnwindSafe<closure-0>> (f=...) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343
#39 0x00005555557bced8 in std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (f=...) at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431
#40 0x00005555557c7128 in std::thread::{{impl}}::spawn_unchecked::{{closure}}<closure-2,()> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:473
#41 0x00005555557b851e in core::ops::function::FnOnce::call_once<closure-0,()> () at /home/<user>/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
#42 0x0000555555ca547a in alloc::boxed::{{impl}}::call_once<(),FnOnce<()>,alloc::alloc::Global> () at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#43 alloc::boxed::{{impl}}::call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> () at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/alloc/src/boxed.rs:1521
#44 std::sys::unix::thread::{{impl}}::new::thread_start () at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0//library/std/src/sys/unix/thread.rs:71
#45 0x00007ffff7d7a299 in start_thread () from /usr/lib/libpthread.so.0
#46 0x00007ffff7ca3053 in clone () from /usr/lib/libc.so.6
kvark commented

Thank you!
Could you make sure to install the validation layers, hook up env_logger to your app, and see what they print on the output?

Device { context: Context { type: "Native" }, id: Device { id: (0, 1, Vulkan), error_sink: Mutex { data: ErrorSink }, features: (empty) } }
[2021-04-06T03:11:24Z ERROR gfx_backend_vulkan] 
    VALIDATION [VUID-vkFreeCommandBuffers-commandBufferCount-arraylength (257118683)] : Validation Error: [ VUID-vkFreeCommandBuffers-commandBufferCount-arraylength ] Object 0: handle = 0x55f3a6e51aa8, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xf5351db | vkFreeCommandBuffers: parameter commandBufferCount must be greater than 0. The Vulkan spec states: commandBufferCount must be greater than 0 (https://www.khronos.org/registry/vulkan/specs/_MAGIC_KHRONOS_SPEC_TYPE_/html/vkspec.html#VUID-vkFreeCommandBuffers-commandBufferCount-arraylength)
    object info: (type: DEVICE, hndl: 94504965446312)
    
Segmentation fault (core dumped)
kvark commented

The validation error is addressed in gfx-rs/gfx#3717
Let's see if this issue gets fixed once the fix reaches wgpu-rs

Closing because of a potential fix and wgpu-rs -> wgpu migration. Please refile if this is still an issue.