Should we add "backend::none"?
Opened this issue · 12 comments
I think we made a mistake in SYCL 2020 for event::get_backend
in two cases:
- The default constructed
event
- The
event
that is returned when you submit a host task
In both of these cases, there's no obvious backend. As a result, there's no obvious backend interop that you can do with these events. For example, what should get_native
return for these events?
I think there's an implicit assumption that when object.get_backend
returns backend::opencl
, an application is allowed to use OpenCL backend interop APIs on that object. Therefore, I think it was a mistake to return backend::opencl
for events that are created in the two cases I list above.
I think the cleanest solution would be to introduce a new enumerator called backend::none
for cases like this. When an object reports backend::none
, we would not support any backend interop for that object.
I also think backend::none
would be useful for implementations that do not layer on any established "back end". For example, you could imagine an implementation that implements SYCL directly on hardware, without going through any documented backend API. Such an implementation could return backend::none
for all calls to get_backend
.
I think the specification should clearly specify what backend
enumerator is returned, though. Implementations should not have complete freedom to choose on an object-by-object basis. Here are the rules that make sense to me:
-
An implementation has complete freedom to decide what backend is returned from
platform::get_backend
for eachplatform
object . This can be a Khronos-defined enumerator likebackend::opencl
, a vendor-defined enumerator likebackend::ext_oneapi_level_zero
, orbackend::none
. Each backend enumerator (except forbackend::none
) should have a matching backend interop specification. Khronos will define this specification if the backend enumerator is defined by Khronos. A vendor should define this specification if the backend enumerator is defined by the vendor. -
The SYCL specification should clearly document what is returned from all other
class::get_backend
member functions. In most cases, we will say that it returns the same backend as the associated platform. In the two "event" cases that I document above, we will say thatbackend::none
is returned.
This seems like the right balance of implementation freedom and API consistency to me.
Since this affects behavior of SYCL 2020 APIs, I think we should consider this as a bug fix to the SYCL 2020 specification.
Comments?
I did go back and forth in my mind about it.
- The backend-spec can be customized to return a
null pointer
for the interrupt object the backend doesn't want to support viaget_native.
(or just throw). It Just needs to be documented in the backend spec. "easier to ask for forgiveness than permission," as they said in Python
On the other hand, adding backend::none
doesn't seem to add much complexity. I'm not sure if the SYCL spec should say that those 2 APIs about events should always return backend::none
. Who knows what a fancy implementer can do?
I'm not sure if the SYCL spec should say that those 2 APIs about events should always return
backend::none
. Who knows what a fancy implementer can do?
That's a good point. We should not write the core spec to be too proscriptive about what a backend can do. How about this? The core spec definition for get_backend
(for all classes) will say something like:
Returns the backend that underlies this object. When the backend is not
backend::none
, backend interoperation is available for this object as defined by the corresponding SYCL backend specification.
Each backend specification can then provide further guarantees if it wants. For example, the OpenCL backend specification might say something like:
When a platform has
backend::opencl
, SYCL objects that are associated with that platform are also guaranteed to havebackend::opencl
, with the following exception. Anevent
object that is returned when submitting a host task hasbackend::none
. As guaranteed by the core SYCL specification, all objects withbackend::opencl
support the interoperation APIs defined in this specification.
However, I still think it was a mistake for the core SYCL spec to say that the default-constructed event
has the same backend as the default-constructed queue. This was really an arbitrary decision. At the time, we needed to think of some definition for this event's backend, and this is what we came up with. If we decide to add backend::none
to the core spec, then it would make much more sense to say that the default-constructed event has backend::none
.
Returns the backend that underlies this object. When the backend is not backend::none, backend interoperation is available for this object as defined by the corresponding SYCL backend specification.
Sound good to me!
However, I still think it was a mistake for the core SYCL spec to say that the default-constructed event has the same backend as the default-constructed queue. This was really an arbitrary decision. At the time, we needed to think of some definition for this event's backend, and this is what we came up with. If we decide to add backend::none to the core spec, then it would make much more sense to say that the default-constructed event has backend::none.
I think it's still useful to have a backend object for default-constructed events associated with the default-constructed queue.
For example, I write a lot of L0. But generating L0 event is tedious. So my process is to generate a default constructed sycl event and then get the ze_event
associated with it.
In short, I think we should let all those kinds of implementation-choise to the backend spec and not try to formalize anything too much on the "main" spec.
I think it's still useful to have a backend object for default-constructed events associated with the default-constructed queue.
For example, I write a lot of L0. But generating L0 event is tedious. So my process is to generate a default constructed sycl event and then get theze_event
associated with it.
This seems like an asymmetry in our API, though. You can create a default-constructed event only for the backend associated with the default device constructor. What if you wanted to create a default-constructed event for OpenCL or CUDA (on DPC++)? I think it was a mistake for the core SYCL API to favor one backend like this.
If we really want an API to create a "signaled" event that is specific to a particular backend, we should instead create a backend interop API that does this. For example, Level Zero could have something like sycl::ext::oneapi::level_zero::make_event()
.
This seems like an asymmetry in our API, though. You can create a default-constructed event only for the backend associated with the default device constructor. What if you wanted to create a default-constructed event for OpenCL or CUDA (on DPC++)? I think it was a mistake for the core SYCL API to favor one backend like this.
That makes sense. We should remove this requirement from the spec. But I still think we should not put the required 'backend::none' in the core spec; what default constructed event
return can be left as an implementation choice. I don't see any harm with this approach.
If we really want an API to create a "signaled" event that is specific to a particular backend, we should instead create a backend interop API that does this. For example, Level Zero could have something like sycl::ext::oneapi::level_zero::make_event().
I can always do e = Q.single_task([]{})
and call it a day. So special need for a new API :)
I'm just always a little concerned about "over-specifying" stuff and that this will beat us in the future. But I also like clear API. So to be honest I'm a little torn over on this one :)
But the core idea of backend:none
is definitely a good idea to report "unsupported."
That makes sense. We should remove this requirement from the spec. But I still think we should not put the required 'backend::none' in the core spec; what default constructed
event
return can be left as an implementation choice. I don't see any harm with this approach.
I can live with this. We can change the spec to say that the backend for the default-constructed event is implementation-defined.
I think the cleanest solution would be to introduce a new enumerator called backend::none for cases like this. When an object reports backend::none, we would not support any backend interop for that object.
I'm not sure I'm convinced. This introduces an invalid backend value that now in principle needs to be handled by client code wherever in the SYCL API a backend is used, to solve something that is a design error only in the event backend interop API, or perhaps even the OpenCL backend specification (other backends can always say that events are not supported for interop).
The fact that we need some non-trivial rules like you outline to me indicates that backend::none
may not be as simple a solution as it seems.
AdaptiveCpp does not support backend interop for events, for good reason. I think there can also be other cases where an event is not directly connected to a backend event (e.g. you do update_host
on an up-to-date buffer, which results in a no-op). There might also be events that map to events from multiple backends - in the mentioned update_host
case, the relevant events would be from the requiremens of the update_host
, which can be from wherever. In this case, the correct answer to "which backend am I on" might not be "none", but potentially "multiple".
I think that fundamentally the event backend interop API is broken.
The usual thing the SYCL API does when it is asked to do something it cannot comply with is to throw. Why don't we throw here as well?
The fundamental solution to this problem, I think, would be to treat event
as a std::vector
of backend events, each with its own backend value. This vector might be empty in the cases that you mention.
PR #577 makes two related changes. Let's take them separately for a moment. One change adds backend::none
, defining this as a standard way to indicate that a SYCL object is not associated with any documented backend. Nothing in the specification requires an implementation to return this value under any circumstance. If ACPP doesn't need this enumerator, you can just ignore it.
I think backend::none
is a good addition to the specification even ignoring this issue with event
. For example, a SYCL implementation might layer directly on hardware without going through any documented "heterogeneous API" (e.g. OpenCL, CUDA, etc.) These implementations can return backend::none
to indicate that no backend interoperation is available.
Some implementations may choose to return backend::none
even when they do layer on top of a heterogeneous API. For example, an implementation might do this if it does not yet support any backend interoperation APIs.
I think you are claiming that adding backend::none
adds a new burden on applications because they need to check for this when calling get_backend
. However, I don't think there is any additional burden here. An implementation is allowed to add custom enumerators to the backend
enumeration, and any application call to get_backend
needs to check to see which enumerator it gets. Therefore, existing conformant application code must already be doing something like:
if (obj.get_backend() == /*some backend I know about*/) {
/* do interop for that backend */
}
This code will continue to work even if we add a new enumerator backend::none
.
The other change that #577 makes is to relax the requirements on the backend that is associated with the default-constructed event. Previously, the spec required this to be a specific backend (the same backend as the default device.) In retrospect, that seems like it was a mistake. The PR changes this to say that the backend of the default-constructed event is implementation defined. This does not place any new requirement on implementations, so ACPP can continue to do whatever it does now.
In our experience, applications don't want to use backend interop with the default-constructed event anyways, so I think this change will not have an impact on application code.
Your observation that there could be cases where a SYCL event
object has multiple backends is interesting. I'm not sure if DPC++ has this issue. Thinking off the top of my head, we could add a new enumerator backend::multiple
and then also add a new member function get_backends
which returns a vector of backend
values. We should think this through more, though, and I wouldn't want to include this as part of #577.
Responding to this:
The usual thing the SYCL API does when it is asked to do something it cannot comply with is to throw. Why don't we throw here as well?
Most SYCL APIs provide a way to test for error condition beforehand, so you don't need to surround all your API calls with try
/ catch
blocks. I think it should be the same with get_backend
. It would be a pain to require users to add try
/ catch
blocks around all their calls to get_backend
, especially when they need code anyways to test the value that is returned.
PR #577 makes two related changes. Let's take them separately for a moment. One change adds backend::none, defining this as a standard way to indicate that a SYCL object is not associated with any documented backend. Nothing in the specification requires an implementation to return this value under any circumstance. If ACPP doesn't need this enumerator, you can just ignore it.
That's true, but what if we have some API in the future that accepts a backend as a non-template argument? AdaptiveCpp already does this internally, and in this case this change requires handling the invalid value everywhere.
I think backend::none is a good addition to the specification even ignoring this issue with event. For example, a SYCL implementation might layer directly on hardware without going through any documented "heterogeneous API" (e.g. OpenCL, CUDA, etc.) These implementations can return backend::none to indicate that no backend interoperation is available.
Nothing in the SYCL specification requires implementations to expose any kind of backend interop if they don't want to for their own backends. In a non-layered implementation you could just as well have some <vendor>_magic_internal_mechanism
entry in the enum and not expose any interop. I don't think this is any burden for such an implementation currently that requires a change.
Some implementations may choose to return backend::none even when they do layer on top of a heterogeneous API. For example, an implementation might do this if it does not yet support any backend interoperation APIs.
Interop is not required anyway. We do have a way already to advertise that no interop is available: By having a backend enum entry that is not one where a backend specification with an interop model exists.
If we need a way to query interop abilities for cases where the existing backend_traits
do not suffice, I think we should just add an explicit query for that.
I think you are claiming that adding backend::none adds a new burden on applications because they need to check for this when calling get_backend. However, I don't think there is any additional burden here. An implementation is allowed to add custom enumerators to the backend enumeration, and any application call to get_backend needs to check to see which enumerator it gets. Therefore, existing conformant application code must already be doing something like:
That's fair, but this is only the case because the main reason of using backend
so far is interop. Is this going to be the only use of the enum forever?
For example, in the case of buffer-USM interop, we might want to have queries on the buffer like "give me all device pointers for this backend" or "which backends do you have allocations for?"
Your observation that there could be cases where a SYCL event object has multiple backends is interesting. I'm not sure if DPC++ has this issue. Thinking off the top of my head, we could add a new enumerator backend::multiple and then also add a new member function get_backends which returns a vector of backend values. We should think this through more, though, and I wouldn't want to include this as part of #577.
Having backend::multiple
would be even more complicated for users, because that they would first need to check get_backend
, if it's multiple
do another call to get_backends
. In this case, why keep get_backend
at all?
My argument is: A proper solution requires changing the backend model for event
towards using std::vector<backend>
. This is probably not possible as a fix for SYCL 2020. Once we introduce backend::none
we will be stuck with it, since it needs a deprecation period before it could be removed again. As a temporary solution it's way easier to say that there are some edge cases where the API may throw, since we can always say in the future that the API no longer throws once we have a proper solution.
As a temporary solution it's way easier to say that there are some edge cases where the API may throw, since we can always say in the future that the API no longer throws once we have a proper solution.
Doing this would be a really big API break. All the get_backend
member functions are currently declared noexcept
. In addition, there is a LOT of application code out there now that calls get_backend
. We cannot ask all that application code to insert try
/ catch
clauses around all those calls.
Perhaps our disagreement stems from the meaning of get_backend
. My understanding is that when get_backend
returns a certain backend enumerator, it is a promise by the implementation that the SYCL object supports interoperation according to the rules defined in the corresponding backend interoperation specification.
If we agree on this meaning, then I think it becomes clear that the current definition of the default-constructed event makes no sense. The spec currently states that the backend of this event must be the same backend as the device returned by the default selector, this means that the event's get_backend
must return this backend enumerator, and thus the default-constructed event must support interoperation according to that backend specification. There is no logical reason to mandate which backend is associated with the default-constructed event. An implementation may support many backends. It seems very arbitrary that the spec chooses one particular backend and then requires the implementation to provide interoperation with that backend.
Focusing for now on just this one part of the PR, do you agree that it makes sense to relax this mandate and allow the implementation to choose the backend that is associated with the default-constructed event?
Doing this would be a really big API break. All the get_backend member functions are currently declared noexcept.
Okay, that's a good point.
Perhaps our disagreement stems from the meaning of get_backend. My understanding is that when get_backend returns a certain backend enumerator, it is a promise by the implementation that the SYCL object supports interoperation according to the rules defined in the corresponding backend interoperation specification.
Maybe you're right and this is a source of disagreement. I think that ideally, the returned backend is aligned with interop as you say. But I think I see these edge cases more as a defect in the interop API and interop model rather than the backend model. I suspect that intuitively, users might also interpret get_backend
differently. For example, if they have submitted some kernel and get the event, they then might later look at event::get_backend()
because they want to know which backend has processed the kernel submission, e.g. because they know that different backends have different performance characteristics. In that case, the expectation would be that even events that cannot interop would return the backend that the command was submitted to.
Focusing for now on just this one part of the PR, do you agree that it makes sense to relax this mandate and allow the implementation to choose the backend that is associated with the default-constructed event?
Yes, this change would be totally fine for me. I think mandating a specific backend here is probably a remnant of the host backend/host device model from SYCL 1.2.1.
Side note: But before I forget ( I guess I should start a new thread...)
SYCL queues execute kernels on a particular device of a particular context, but can have dependencies from any device on any available SYCL backend.
I guess that means, kernel cannot have a dependency of task running on device who doesn't have a backend (if we follow the interpretation that backend == interop)