kenba/opencl3

Undefined behaviour when using underlying OpenCL pointers without any unsafe

awused opened this issue · 11 comments

If the underlying OpenCL pointers to objects are used it's possible to exhibit undefined behaviour without any uses of unsafe in application code. This could happen when mixing opencl3 with another library, or trying to bypass the lack of Clone/Sync on types like Context.

use opencl3::command_queue::{CommandQueue, CL_QUEUE_PROFILING_ENABLE};
use opencl3::context::context::release_context;
use opencl3::context::Context;
use opencl3::device::{get_all_devices, Device, CL_DEVICE_TYPE_GPU};
use opencl3::Result;

fn main() -> Result<()> {
    let device_id = *get_all_devices(CL_DEVICE_TYPE_GPU)?
        .first()
        .expect("no device found in platform");
    let device = Device::new(device_id);

    let context = Context::from_device(&device).expect("Context::from_device failed");

    release_context(context.get())?;

    let _this_will_segfault =
        CommandQueue::create(&context, context.default_device(), CL_QUEUE_PROFILING_ENABLE)
            .expect("CommandQueue::create failed");

    Ok(())
}
kenba commented

You raise an interesting point but your example is unrealistic: explicitly releasing an OpenCL pointer using a lower level cl3 library function is not something many developers would consider doing to a type which implements the Drop trait to release it for them! Note: the lack of Clone/Sync on Context is done for a good reason, see PR #7. You should consider why it’s not provided before attempting to bypass it. There may be more realistic examples where an OpenCL pointer is invalidated after an OpenCL function call other than release. Can you find one?

What do you propose to fix this issue:

  • Should all the method be unsafe, completely changing the API?
  • Should each type track the validity of the underlying OpenCL pointer, thus slowing down an API which purely exists for performance optimisation?
  • Do you have a better solution?

I actually did read through that bug report as this started in an attempt to replace the unmaintained ocl which does allow cloning of the thread safe C objects. Reading through those PRs and issues led me to believe opencl3 was taking a very conservative approach for safety, since it would require an API redesign, which I accept the reasoning behind, but it then surprised me how easy it was to exhibit UB as I started to play with my options.

It may not be realistic for someone to do exactly this, since it's just example code, but I'd at least expect that the release_* and retain_* methods be unsafe as the user is now mixing your automatic reference counting with their own. The problem is not really that someone can do this, they could always bypass cl3 and call C methods directly, the problem is they can do it without writing unsafe anywhere.

kenba commented

FYI there are no release_* and retain_* methods in this library, they are all functions in the cl3 library.

It doesn't seem reasonable to me for the release_* and retain_* functions in the cl3 library to be declared unsafe for this issue.

I would prefer to make the get* methods in this library unsafe, since that would require calls to release_* to be unsafe. However, as you pointed out in issue #53, OpenCL has other means to cause undefined behaviour and I’m happy to bet that you haven’t found all of them yet. Perhaps the whole library should be declared unsafe! ;)

I do generally see libraries take one of two paths: either leaving everything unsafe and doing no validation or trying to expose only safe operations with some unsafe escape hatches. Especially from the discussion around not making things Sync it seemed like this library was going beyond what is really required to make sure things are safer. As an example, ocl which I was trying to migrate from, requires the use of unsafe methods to make use of CL_MEM_USE_HOST_PTR since it requires that the user uphold some additional invariants.

FYI there are no release_* and retain_* methods in this library, they are all functions in the cl3 library.

opencl3 re-exports them publicly, so they're part of this crate's API. I do not have to directly depend on cl3 to use them.

vmx commented

FYI there are no release_* and retain_* methods in this library, they are all functions in the cl3 library.

opencl3 re-exports them publicly, so they're part of this crate's API. I do not have to directly depend on cl3 to use them.

I agree with @awused here. From the API docs it's not clear that those are cl3 functions. Taking opencl3::context as example, perhaps it would make sense making the cl3::context import private.

kenba commented

Thank you @vmx I appreciate your thoughts on this issue.

I understand making the function imports private to clarify that they are cl3 functions and not a part of the opencl3 API. I agree that the opencl3::context module could just import the functions privately, while other modules (e.g. memory) could import the cl3 functions privately and cl3 constants and types publicly to minimise changes to the existing API. However, this change does not fix the original issue of being able to call the cl3 release_* and retain_* functions without unsafe.

Do you consider this change to be enough, or do you think that access to the underlying OpenCL pointer be via an unsafe method as well?

vmx commented

Do you consider this change to be enough, or do you think that access to the underlying OpenCL pointer be via an unsafe method as well?

I would say it would be a first step. But probably most of cl3 should be unsafe. I agree that having almost everything declared as unsafe is a bit painful to use, but sadly that's what OpenCL is. But I also agree with @awused, that I would expect that if I don't have any unsafe keyword in my own code, that I won't run into segmentation faults, no matter what I do.

This means that I'd like to see opencl3 being a safe wrapper (where possible) and having cl3 as kind of it's unsafe sibling which you can use, but have to be careful.

kenba commented

But probably most of cl3 should be unsafe.

@vmx OpenCL has many functions that could be considered unsafe from a Rust perspective (see issue #52 ), but making most of cl3 unsafe is overkill IMHO.

An intention of this library was to simplify the use of OpenCL objects by managing them using the Drop trait. @awused 's initial example shows that the release_* and retain_* functions of the cl3 library are unsafe in this library as they alter the underlying OpenCL C library's object reference counters, potentially invalidating this library's objects.

Just changing the release_* , retain_* and svm_free functions to be unsafe in the cl3 library would fix this issue.

could be considered unsafe from a Rust perspective (see issue #52 )

That one is definitely unsafe from a Rust perspective, and it shouldn't be reachable in safe Rust. That one even more than this one because CL_MEM_USE_HOST_PTR is something actually used. Since it's a bit flag you have to get a little bit creative, but solutions exist to make CL_MEM_USE_HOST_PTR unsafe without requiring all buffer initialization code to be unsafe.

the release_* and retain_* functions of the cl3 library are unsafe in this library

They're unsafe in any context, even if I'm using cl3 directly, or if there were a bug in opencl3, and release_ was called twice on a pointer with only one reference I'd get UB. I need to know, in advance, that the OpenCL object is valid before I call release_ or retain_ and that's the textbook definition of unsafe: "Code or interfaces whose memory safety cannot be verified by the type system." (from unsafe)

kenba commented

@awused I have created kenba/cl3#25 to fix this issue and reopened issue #52 to capture the issue with other functions and methods causing undefined behaviour.

I would appreciate it if you could list all the cl3 functions that you believe can lead to undefined behaviour in kenba/cl3#26.

kenba commented

Changes published to crates.io.