gfx-rs/wgpu

Device::create_buffer_init() provides no indication that the buffer is only filled with use of Queue::submit()

omeps opened this issue · 2 comments

Description
Device::create_buffer_init() will not fill the created buffer, instead requiring Queue::submit(), but this isn't reflected in the documentation. This doesn't affect using buffers with gpu because you use Queue::submit() there, but does matter if trying to get data back from buffer immediately.
Repro steps

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ft = async {
        let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
            backends: wgpu::Backends::all(),
            ..Default::default()
        });
        let (device, queue) = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::default(),
                compatible_surface: None,
                force_fallback_adapter: false,
            })
            .await
            .unwrap()
            .request_device(
                &wgpu::DeviceDescriptor {
                    label: None,
                    required_limits: wgpu::Limits::default(),
                    required_features: wgpu::Features::MAPPABLE_PRIMARY_BUFFERS,
                },
                None,
            )
            .await
            .unwrap();
        let mut noisy_data = [5; 4096 * 4];
        rand::thread_rng().fill_bytes(&mut noisy_data);
        let mut rand_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("to sort"),
            contents: &noisy_data,
            usage: wgpu::BufferUsages::STORAGE | wgpu::BufferUsages::MAP_READ,
        });
        //uncomment to fix: queue.submit([]);
        let (tx, rx) = futures_intrusive::channel::shared::oneshot_channel();
        rand_buffer
            .slice(..)
            .map_async(wgpu::MapMode::Read, move |result| {
                tx.send(result).unwrap();
            });
        device.poll(wgpu::Maintain::Wait);
        rx.receive().await.unwrap().unwrap();
        let slice = rand_buffer.slice(..).get_mapped_range();
        println!("{:?}", &slice[0..10]);
    }

Expected vs observed behavior
zeros are printed because the buffer wasn't filled.
queue.submit([]) fixes this.
[docs] (https://docs.rs/wgpu/latest/wgpu/struct.Device.html#method.create_buffer_init) says:

Creates a Buffer with data to initialize it.

This is misleading as the buffer may not be filled.

Platform
I am running Ubuntu 22 on an Inspiron 14.

interesting corner case! I believe this should be fixed by either not finishing the read mapping as long as there's data scheduled to be copied to the buffer (which we then document to only be able to finish during a Queue::submit) or by scheduling the necessary buffer copies during Device::poll in addition to Queue::submit. On one hand I prefer the later, but on the other it's a bit odd since WebGPU doesn't have poll in the first place (it instead has "give control back to the browser and wait an arbitrary amount of time"), not sure yet what would be better 🤔

Naturally this isn't specific to the create_buffer_init utility function but to all buffer with on-init-mapping.

Is this still observable without MAPPABLE_PRIMARY_BUFFERS?

Is this still observable without MAPPABLE_PRIMARY_BUFFERS? It doesn't appear to be -- this code runs with the unexpected behavior without MAPPABLE_PRIMARY_BUFFERS:

[dependencies]
wgpu = "0.19"
pollster = "0.3"
futures-intrusive = "0.5"
rand = "0.8"
use rand::RngCore;
use wgpu::util::DeviceExt;
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ft = async {
        let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
            backends: wgpu::Backends::all(),
            ..Default::default()
        });
        let (device, queue) = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::default(),
                compatible_surface: None,
                force_fallback_adapter: false,
            })
            .await
            .unwrap()
            .request_device(
                &wgpu::DeviceDescriptor {
                    label: None,
                    required_limits: wgpu::Limits::default(),
                    required_features: wgpu::Features::empty(),
                },
                None,
            )
            .await
            .unwrap();
        let mut noisy_data = [5; 4096 * 4];
        rand::thread_rng().fill_bytes(&mut noisy_data);
        let mut rand_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("to sort"),
            contents: &noisy_data,
            usage: wgpu::BufferUsages::MAP_READ,
        });
        //uncomment to fix: queue.submit([]);
        let (tx, rx) = futures_intrusive::channel::shared::oneshot_channel();
        rand_buffer
            .slice(..)
            .map_async(wgpu::MapMode::Read, move |result| {
                tx.send(result).unwrap();
            });
        device.poll(wgpu::Maintain::Wait);
        rx.receive().await.unwrap().unwrap();
        let slice = rand_buffer.slice(..).get_mapped_range();
        println!("{:?}", &slice[0..10]);
    };
    pollster::block_on(ft);
    Ok(())
}

I personally would prefer the latter option as well, but I don't use wgpu for the web. I think the underlying problem is that it can be confusing for a utility function from device require the use of queue. It might be nice to do both, but I think anything that makes the proper use of on-init-mapping clear is better than a strange edge case.