kenba/opencl3

Option for fallible ExecuteKernel::set_arg or better Error messages

Closed this issue · 12 comments

Hey!

First of all, thanks for this crate. It has mostly worked out great (were getting to the mostly here;-P, wasn't a big issue).

Where I got a small issue, that was fixed quickly, was when I used the ExecuteKernel helper to execute. It turns out I set parameters wrong (usize instead of cl_int).

However the Error-Message I got, wasn't great:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ClError(-51)',  /home/[cut]/.cargo/registry/src/github.com-1ecc6299db9ec823/opencl3-0.8.0/src/kernel.rs:356:50

After I Isolated where that code was getting called from, and looking up the Errorcode, it was a quick fix.

But I understand that ExecuteKernel is meant to be an "Easy to use" API, as such I think this could be improved.

The issue is this line in your library:

self.kernel.set_arg(self.arg_index, arg).unwrap();

Kernel::set_arg returns an Result<_, ClError>. As you said here you're unwilling to adjust the Debug impl, but as you unwrap in the library there isn't much choice to me.

Also while we're at it, I would, like requested in there, prefer if the Display impl would also display the OpenCL Errornumber.
While Display is supposed to be used more superfluously, its still an error, having a straight reference to the actual underlying issue would be prefferable.

Anyways, for this particular issue I'd recommend either replacing the unwrap in your impl with something that also displays a nice error message, or / and prefferably exposing a fallible set_arg operation on ExecuteKernel.

regards, Samuel

Oh, and the #Panics section in the documentation is wrong as it is right now, as it will also panic if you provide the wrong SIZED type, as I demonstrated (it would not panic on the wrong type, I assume, don't know OpenCL good enough).

kenba commented

Thanks Samuel, I didn't consider these kinds of bugs when I designed the ExecuteKernel builder.
A better solution would be to return the error code instead of panicking. However, I'll need some time to investigate it and determine whether it's possible. What do you think?

Changing to Fallible API (=> returning a Result) is a breaking change, but I'd prefer it. Alternatively offer a fallible alternative and warn against the issues in the original method.

If you DO decide to return a result then it would probably be fitting to also embed the "too many arguments" situation in the error, some error enum like you'd do with thiserror.

It would also be great if you or I could find out if my assumption, that passing the wrong type but with the right size doesn't lead to an error could be tested, and thst situation explained in the documentation. It's an invariant, leaving that open without even documenting it isn't the Rust way. As I have exams this week I won't be able to see to that before the weekend at earliest.

kenba commented

Having given this issue more consideration, changing to a Fallible API would be both a breaking change and a mistake.

This issue was due to a bug in your code: "It turns out I set parameters wrong (usize instead of cl_int)." so as you suggested, this is just a duplicate of #34. I'm sorry that you had to look up ClError(-51) like a C/C++ developer would, but it "wasn't a big issue" for you. I'm not willing to change line 356 for this issue.

ExecuteKernel uses the builder pattern to simplify setting up and enqueueing a Kernel on a CommandQueue, it saves you from having to set arg indices, etc. It is relatively "easy to use" but it is not fool proof! OpenCL is not like Rust, the borrow checker can't save you from calling a kernel with incorrect arguments. You need to take care when setting kernel arguments. Your mistake was only caught because a usize is a different size to a cl_int. OpenCL cannot detect that you used unsigned int where the kernel required a signed int!

If you want a Fallible API, you can call set_arg on a kernel directly instead of using the ExecuteKernel builder. You can then handle the results however you want.

You are quite right about the #Panics section in the documentation.
Please can you list ALL the other panics that set_arg can create so I can ensure that it is complete? ;-P

Hmm, you're not wrong in that, if indeed it doesn't fail on the wrong type, you can't ensure entirely correct semantics anyways, perhaps it might even suggest a safety that isn't there.

I'll hopefully get to testing a bit more this afternoon to validate that strong suspicion (I wouldnt know how to test that with C APIs) and see if I can find other issues.

But I would like to highlight another issue with the current approach, that I indeed did not correctly highlight in my above post: Finding the location where the error was caused.

My Programm was a toy program I wrote to test OpenCL. It just instantiates one kernel with 5 arguments. Thus it was easy to find the error location once I looked up the error and saw that the error code implies a kernel argument.

But for a larger program with multiple kernels and/or conditional invocations it would not always be so easy.

If you unwrap panics in your libary code than just the library location gets embedded, which, without enabling backtraces, doesn't show the line in the user's code that caused the issue. If you make them unwrap it themself it would directly show them the right location.

In addition to that, even with the current state, since panics are not used as control flow in Rust, you could easily change the error message by not directly unwrapping and showing at best both the error code and the interpretation.

Similarly I would appreciate if, as I said earlier, the debug display implementation of CLError would also show the error code. As you said in the issue I linked to, newer errors would show up as unknown, if you also add the error code the user doesn't have to figure out how to get to the error code, and searching for the error code is more reliable than for the error message. I did create a wrapper around CLError in my code for that singular purpose.

Uff. I did plan to do this while writing the train, but there is an absolute chaos where I have to search for alternative routes, and can't find a place to sit because half the trains get dropped. Probably won't hear from me again today.

As of now I'm aware of the following panics:

  • wrong amount of arguments
  • wrong size fortthe current argument

And the following invariant that does not get tested and should be mentioned:

  • you can pass an argument of the wrong type without error
kenba commented

If you unwrap panics in your libary code than just the library location gets embedded, which, without enabling backtraces, doesn't show the line in the user's code that caused the issue. If you make them unwrap it themself it would directly show them the right location.

I use panics for bugs, see "When panicking indicates a bug in the program" in unwrapping-isnt-evil .
"which, without enabling backtraces, doesn't show the line in the user's code "; then just enable backtraces!

However, I do agree with you about the location of the error.
I can change line 356 to indicate which argument was at fault and both the error code and text for you, e.g.:

        match self.kernel.set_arg(self.arg_index, arg) {
            Err(e) => panic!(
                "ExecuteKernel::set_arg invalid kernel arg at index: {}, {:?}, {}",
                self.arg_index, e, e
            ),
            _ => (),
        }

Which gives:

panicked at 'ExecuteKernel::set_arg invalid kernel arg at index: 3, ClError(-51), CL_INVALID_ARG_SIZE', 

If you know a more elegant solution (i.e. without using match) then please let me know, because there are a bunch of other methods, e.g. set_arg_local_buffer, set_arg_svm, etc. that would also benefit from this change.

I do also panic and unwrap (or expect), but only if the I variants I test are these I have to uphold on my own. Otherwise IMO it's a good idea to leave the behavior to the user.

But anyways. I may not be able to use my Laptop, but I can write on mobile:

Regarding the match, wouldn't if let Err(e) improve that? Otherwise, you could also map the err case with map_err or similar before unwrapping.

I have not ever used it, but perhaps you could investigate the track_caller attribute if you want to keep the unwrap: https://doc.rust-lang.org/reference/attributes/codegen.html#the-track_caller-attribute

kenba commented

Thank you Samuel. I've changed the unwraps to map_errs and committed the change to the develop branch.

Thanks! Would you at all be interested in a PR adding the track_caller attribute to improve where the error is shown to be thrown, or do you not want that complexity? I did look a bit closer at the documentation and it doesn't seem too difficult.

And the situation with a same-sized type is mentioned nowhere.

Otherwise I'd consider this closed.

kenba commented

The situation with a same-sized type is a fundamental issue with using OpenCL; you must be careful when enqueueing OpenCL kernels. Using incorrect arguments is just one of many issues that require developers to take care - there's no borrow checker to save you from yourself.

Please feel free to raise a PR for the track_caller attribute.

Closing this as with the changes in #50 and by you every mayor point has been addressed.

Thanks for your time!