Coarse-grained SVM has to be mapped before usage!

Question

Coarse-grained SVM has to be mapped before usage!

Closed this issue 3 years ago · 5 comments

I have a system with CL_DEVICE_SVM_COARSE_GRAIN_BUFFER only.
SVMVec does not work as expected, since the buffer has to be mapped via CL_MAP_READ / CL_MAP_WRITE before reading / writing and unmapped before executing a kernel. While an SVM is mapped, you cannot grow it. So SVMVec::push() ect. won't work. Also, SVMVec::with_capacity_zeroed() does not zero the values, since it doesn't map.

See https://software.intel.com/content/www/us/en/develop/articles/opencl-20-shared-virtual-memory-overview.html#access

I tested this with the correctly applied mapping game and everything worked just fine.

Answer 1 · 2021-09-08T09:14:18.000Z

Here is a modified sample part of the Readme.md that works for me:

        // Copy input data into an OpenCL SVM vector
        let mut test_values = SvmVec::<cl_int>::with_capacity(&context, svm_capability, ARRAY_SIZE);
        unsafe { test_values.set_len(ARRAY_SIZE) };

And then I could either:

        if !is_fine_grained_svm {
            queue.enqueue_svm_map(CL_BLOCKING, CL_MAP_WRITE, &mut test_values, &[]).unwrap();
        }
        test_values.copy_from_slice(&value_array);
        if !is_fine_grained_svm {
            queue.enqueue_svm_unmap(&mut test_values, &[]).unwrap();
        }

Or just:

        queue.enqueue_svm_mem_cpy(
            CL_BLOCKING,
            test_values.as_mut_ptr() as *mut c_void,
            value_array.as_ptr() as *const c_void,
            ARRAY_SIZE * mem::size_of::<cl_int>(),
            &[] ).unwrap();

Which seems to sync the memory automatically.
Either way, it takes away a lot of comfort.

Answer 2 · 2021-09-08T19:00:58.000Z

Thank you for taking the time to analyse this @Draghtnod.
Your issue is closely related to issue #32, which also involves CL_DEVICE_SVM_COARSE_GRAIN_BUFFER support.

I would like to fix this issue in SvmVec, but as you point out (and the linked Intel article explains), coarse grained SVM needs to be mapped before it can be grown or initialised.

The easiest solution for me is to ensure that SvmVec can only be used with fine grained SVM; coarse gained SVM would just have to use conventional buffers, as per issue #32 . However, I would like to support coarse grained SVM. Do you have any ideas how best to support it without braking the "comfort" of fine grained SVM?

Answer 3 · 2021-09-10T15:14:02.000Z

@Draghtnod there is a test test_opencl_svm_example in integration_test.rs that shows how to use both fine and coarse grained SVM. However, the coarse grained SVM test was incorrect. Fortunately, I've been able to update one of my systems and I've now corrected it to call enqueue_svm_map before setting the values in the input SVM vectors.

Answer 4 · 2021-09-12T11:14:05.000Z

@Draghtnod I've just incorporated changes to fix this issue in version 0.5 of the library on crates.io.
If you are satisfied with the changes then please close this issue.

Answer 5 · 2021-09-15T12:02:40.000Z

@kenba I read your code changes and I think this is fine. SvmVec may still be useful on systems with CL_DEVICE_SVM_COARSE_GRAIN_BUFFER, so disabling it would be a shame. On the other hand, shifting the hustle of automatically queuing all the mapping/unmapping into SvmVec, so it feels like a fine grained buffer, would take away the flexibility to queue in an efficient manner. Leaving the mapping/unmapping in the hands of the user, like you did, is the right thing to do in my opinion.

Maybe it would get a little more comfy for CL_DEVICE_SVM_COARSE_GRAIN_BUFFER users with some SvmVec functions like:

    pub fn map_read(&mut self, &mut queue: CommandQueue) {
        if !self.is_fine_grained() {
            queue.enqueue_svm_map(CL_BLOCKING, CL_MAP_READ, &mut self, &[]).unwrap();
        }
    }

This way you could utilize SvmVec in an conservative manner (not pushing beyond cap() and so on) and could still get coarse buffer conform with relative high convenance by just adding a few map_read()/map_write()/unmap() without writing a different SVM logic for every buffer type.