RFC: add support for querying whether an array library explicitly supports strides
kgryte opened this issue · 7 comments
Overview
The concept of strides has been raised several times since the beginning of the Consortium. This history was summarized in #641. In short, because several array libraries don't have an explicit concept of "strides" and strides can be considered an implementation detail, the Consortium has been intentional about avoiding any formal standardization of stride APIs.
Nevertheless, for a subset of array consuming libraries (such as SciPy and scikit-learn), being able to leverage (and manipulate) strides can confer performance advantages for the performance inclined. As a result, downstream libraries wanting to squeeze out additional performance have a higher likelihood of special casing the arrays of recognized libraries. This special casing is fine so long as you have a small number of supported array libraries; however, it presents a material disadvantage to those libraries also supporting strides which are not one of the privileged few.
Proposal
Now that the array API standard has an inspection API, this RFC seeks to propose adding support for querying whether a conforming array library has explicit support for array strides. Specifically, the following field should be added to the object returned by info.capabilities()
:
{
...
"max rank": Optional[int],
+ "strides": bool
}
With this API in place, downstream libraries could query whether a conforming array library supports strides and special case accordingly.
Questions
-
Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs (e.g.,
stride_tricks
), and, thus, downstream libraries would still be left to check for specific API support. But, this stated, this may be enough of a hook to foster de facto community standardization, of which, among stride supporting libraries, there seems to be. -
As discussed in #571, DLPack allows access to strides; however, this API is a bit cumbersome for downstream libraries to use just to access strides. Nevertheless, maybe this is enough?
-
A related question is, if we decide to move forward with this proposal, does it make sense to also add support for querying whether an array API supports an array "order" (e.g., "row-major" or "column-major")? This is another implementation detail which downstream array libraries hoping to adopt the standard have raised with particular concern for performance implications. Especially for CPU-bound dense array libraries which commonly interface with libraries written in other languages (C/C++/Fortran) and which assume a specific memory layout (row or column major), ensuring that created arrays have a particular layout can confer performance advantages.
Related
I'm not sure how this will help, given:
Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs [...]
If anything that has to do with strides has to be queried for with hasattr
anyway, then doing has_strides = hasattr(x, 'strides')
should be fine too.
The discussion in gh-641 points at defining APIs like sliding_window_view
for the operations that people want to manipulate strides for. That seems like an important exercise, and is probably needed before deciding on this proposal. A good start would be to try using sliding_window_view
for everything in SciPy and scikit-learn that manipulates strides, and seeing how far that gets us.
Does the proposed inspection API provide enough value to downstream libraries to justify inclusion? Notably, this proposal does not seek to standardize any keywords or stride-specific APIs [...]
If anything that has to do with strides has to be queried for with
hasattr
anyway, then doinghas_strides = hasattr(x, 'strides')
should be fine too.
From Python library perspective I am supportive for this RFC. hasattr(x, 'strides')
assumes there's an attribute named strides
, which may or may not be the case. For example, PyTorch makes it a function method without "s": torch.Tensor.stride()
. While this RFC does not answer the question "if a library has strides, how do I query it?", sometimes a yes/no is enough (and if yes, we could then proceed to get the strides via .strides
, from a DLPack capsule, etc). Though, this RFC doesn't answer this question either: "is the returned strides in unit of bytes or counts?", but DLPack has it defined.
If DLPack already exposes strides then is it possible to do whatever stride tricks you would want to do with DLPack?
If we want to say "having __dlpack__
and __dlpack_device__
implemented is the proxy to check if strides exist," I am fine with it too, but we should document this. But then, it seems to be unintuitive for general users. Thus having a yes/no answer queryable to me is a good compromise.
hasattr(x, 'strides')
assumes there's an attribute namedstrides
, which may or may not be the case. For example, PyTorch makes it afunctionmethod without "s":torch.Tensor.stride()
. While this RFC does not answer the question "if a library has strides, how do I query it?", sometimes a yes/no is enough
That's not really a good reason. A simple utility function like this will do:
def has_strides(x):
return hasattr(x, 'strides') or hasattr(x, 'stride')
There are significant downsides to introducing strides as a concept in the standard, and zero upsides as far as I can tell - aside from avoiding having to carry a simple utility function like this. Also, this utility function is going to work for older versions of numpy and other array libraries (which everyone needs), while a new inspection API won't.
If we want to say "having
__dlpack__
and__dlpack_device__
implemented is the proxy to check if strides exist,
That doesn't work I think. These methods are implementable by most libraries, and strides will then always be defined (not accessible from Python) even if the library does not offer non-contiguous arrays. Concrete example: JAX arrays have a __dlpack__
method, but x[::2]
still copies. For the things folks want, it's the "can I do things with strided views efficiently without triggering copies".
Or maybe it's more like there are several different capabilities here that may require determining:
- Are non-contiguous ("strided") arrays supported
- Is it possible to do memory-efficient data read operations (a la
numpy.lib.stride_tricks.as_strided
) - ... ?
Looks like I dropped the ball... 😅
From my perspective as a low-level library developer trying to interface with Python (e.g. nvmath-python is live), I don't care if DLPack or the array library describes a view or a copy. All I care is to have a way to extract and pass the memory layout from an array/tensor to my C libraries for invoking the math ops, and for that I need the strides information not just the pointer. I don't want/need to pick sides on whether to support views or the strides concept in Python. Does it make better sense?
Does it make better sense?
Yes, I think so. Based on the discussion above, my conclusion is that we should add some language to a section in "Design topics and constraints" on strides, with a pointer for anyone interested in getting strides information (as in Leo's case) to leverage DLPack. This should be enough to satisfy most needs, even if a bit unintuitive, while not introducing the concept of strides to the standard.