KhronosGroup/SYCL-Docs

Expected behaviour of nd_item get_global_id(x) where x>NDims

Opened this issue · 4 comments

What is the expected behaviour of nd_item::get_global_id(x) where x is greater than the number of dimensions templating the nd_item.

e.g.

constexpr int NDims = 1;
q.submit([&](sycl::handler &cgh) {
  cgh.parallel_for<KernelName>(sycl::nd_range<NDims>{globalSize, localSize},
                               [=](sycl::nd_item<NDims> it) {
                                 size_t dim0 = it.get_global_id(0);
                                 size_t dim1 = it.get_global_id(1);
                                 size_t dim2 = it.get_global_id(2);
                               });
});

What should it.get_global_id(1) and it.get_global_id(2) return in this case? The spec only states in the nd_item member description of size_t get_global_id(int dimension) const:

Return the constituent element of the global id representing the work-item’s position in the nd-range in the given Dimension.

I see a few potential options to remedy the spec with:

  1. Return 0
  2. Throw an asynchronous error
  3. Undefined behaviour

Ideally, IMO, returning 0 would be preferable since it is precisely defined and does not affect the execution of the kernel.

As a point of context, OpenCL returns 0 in "out of bound" cases:

The global work-item ID specifies the work-item ID based on the number of global work-items specified to execute the kernel. Valid values of dimindx are 0 to get_work_dim- 1. For other values of dimindx, get_global_id() returns 0.

Pinging @hdelan for their interest in the issue.

That seems a little surprising. Presumably, providing a guarantee that 0 is returned requires a runtime check when dimindx is a variable. Does OpenCL assume that it will almost always be a constant? Specifying UB would require the least overhead.

The WG has some concerns about mandating that get_global_id(int dim) returns 0 when dim is out of bounds:

  • If we guarantee that get_global_id(int) does an out-of-bounds check, what about id[n]? Does the id subscript operator also need to do an out-of-bounds check?

  • Some implementations (e.g. safety critical) might prefer to abort the kernel if get_global_id(int) is out of bounds. If the spec mandates that it must return 0 in this case, such an implementation would not be conformant. If the spec says that the behavior is UB in this case, that allows an implementation to report an error if it wants.