[BUG]-ocl_af_app.rs crashes everytime
nodeSpace opened this issue · 10 comments
Reproducible Code and/or Steps
Running the example here: https://github.com/arrayfire/arrayfire-rust/blob/master/opencl-interop/examples/ocl_af_app.rs is crashing at this line: let ptr = af_buffer.device_ptr();
with the error:
(exit code: 0xc000041d)
Process finished with exit code -1073740771 (0xC000041D)
wierdly, if I spawn a new thread and run it in that and also if I run it directly from 'fn main()' (rather then embedded in the ui of my application), it gives this error instead:
(exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Process finished with exit code -1073741819 (0xC0000005)
This has happened every time I've ran the full example code from ocl_af_app.rs
System Information
The af::info()
prints out:
ArrayFire v3.8.0 (OpenCL, 64-bit Windows, build d99887a)
-0- NVIDIA: GeForce GTX 1060 3GB, 3072 MB
[1] NVIDIA: GeForce GTX 1060 3GB, 3072 MB
Driver version is: 27.21.14.5671
Any idea what could be causing this issue?
Checklist
-
[x ] Using the latest available ArrayFire release
-
[x ] GPU drivers are up to date
I was able to reproduce some problem, not the exact one you are having I think. I am looking into it. I believe this has something to do with recent fix in v3.8.
Looking at the docs, at the end here: https://arrayfire.org/docs/unifiedbackend.htm
Don't: Do not use custom kernels (CUDA/OpenCL) with the Unified backend
This is another area that is a no go when using the Unified backend. It not recommended that you use custom kernels with unified backend. This is mainly becuase the Unified backend is meant to be ultra portable and should use only ArrayFire and native CPU code.
Do you think this might be causing the issue, since the set backend line was added?
af::set_backend(af::Backend::OPENCL);
Although if so I don't know of another way to force arrayfire to use the OpenCL backend
I don't think that is the reason because when I tested these examples from interop crate, I always used unified API. In fact, unified API is the main way we provide other language wrappers. I don't think that is the reason since they worked fine earlier. Let me look into it. I will try to get back to you as soon as I can.
I found the problem, it was a missing retain in the example code itself. You need the following additional lines before passing down the buffer to ArrayFire. A silly bug I introduced, sorry about the inconvenience it caused.
unsafe {
retain_mem_object(&buffer).unwrap();
}
let mut af_buffer = af::Array::new_from_device_ptr(
buffer.as_ptr() as *mut f32,
af::Dim4::new(&[dims[0] as u64, 1, 1, 1]),
);
There seems to be another larger issue here and It could be something in ArrayFire itself. Not sure yet where the double release is happening. Theoretically, a retain before passing cl_mem
to ArrayFire should handle the releases on that object just fine. Some how, there is an additional release call happening even with just the below code where we just create an Array using cl_mem and exit the program.
af::set_backend(af::Backend::OPENCL);
let platform_id = ocl_core::default_platform().unwrap();
let device_ids = ocl_core::get_device_ids(&platform_id, None, None).unwrap();
let device_id = device_ids[0];
let context_properties = ContextProperties::new().platform(platform_id);
let context =
ocl_core::create_context(Some(&context_properties), &[device_id], None, None).unwrap();
let queue = ocl_core::create_command_queue(&context, &device_id, None).unwrap();
let dims = [8, 1, 1];
let mut vec = vec![1.0f32; dims[0]];
let buffer = unsafe {
ocl_core::create_buffer(
&context,
ocl_core::MEM_READ_WRITE | ocl_core::MEM_COPY_HOST_PTR,
dims[0],
Some(&vec),
)
.unwrap()
};
ocl_core::finish(&queue).unwrap(); //sync up before switching to arrayfire
afcl::add_device_context(device_id.as_raw(), context.as_ptr(), queue.as_ptr());
afcl::set_device_context(device_id.as_raw(), context.as_ptr());
af::info();
println!("Ref Count: {}",
ocl_core::get_mem_object_info(&buffer, ocl_core::MemInfo::ReferenceCount).unwrap());
let mut af_buffer = {
unsafe { retain_mem_object(&buffer).unwrap(); };
af::Array::new_from_device_ptr(
buffer.as_ptr() as *mut f32,
af::Dim4::new(&[dims[0] as u64, 1, 1, 1])
)
};
println!("Ref Count: {}",
ocl_core::get_mem_object_info(&buffer, ocl_core::MemInfo::ReferenceCount).unwrap());
af::af_print!("GPU Buffer before modification:", af_buffer);
af::set_device(0); // Cannot pop when in Use, hence switch to another device
afcl::delete_device_context(device_id.as_raw(), context.as_ptr());
I shall an update once I have fix for this, hopefully very soon.
Something funny is definitely going on...I couldn't get your fix to work for me until I did this instead:
unsafe {
ocl_core::retain_mem_object(&buffer).unwrap();
ocl_core::retain_mem_object(&buffer).unwrap();
}
so the final file is just the above 2 unsafe lines + https://github.com/arrayfire/arrayfire-rust/blob/master/opencl-interop/examples/ocl_af_app.rs is what ran without an error for me.
@nodeSpace That is the double release I was referring to. Hence, the reason the extra retain is causing the program to exit fine. Only one retain is required before passing the buffer to ArrayFire so that release call by ArrayFire doesn't invalidate buffer object in rust and vice versa. Somehow there is a third release call on cl_mem
that is happening inside ArrayFire upstream which I am trying to track down.
Found the main issue. It is a regression introduced in v3.7.2. Interestingly it wasn't encountered until this particular use case.
I will soon send in a PR for it to the upstream and it would be available in the next fix release.
Sorry about the inconvenience, as far as this example is considered. I missed adding the required retain before passing the cl_mem to ArrayFire - that is the only fix in rust wrapper level. I will fix this too soon.
Thanks for reporting this!
Here's the fix for the example - 87a331e
Even though it seems like an example fix only, I believe it kind of directs users on how to use the crate itself in a key way. I will do a quick new release as soon as I can.
arrayfire/arrayfire#3091 is the upstream. Closing since the example has been fixed.
Thanks for reporting this!
No problem! Thanks for maintaining/creating arrayfire-rust!