Segfault from getitem with array
ngoldbaum opened this issue · 6 comments
import os
os.environ["NUMPY_EXPERIMENTAL_DTYPE_API"] = "1"
import numpy as np
from stringdtype import StringDType
arr = np.array(["a", "b", "c", "d", "e"], dtype=StringDType())
arr[np.array([1, 2, 3])]This script segfaults with the following stack trace (further frames elided for clarity):
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff64fc7c4 in mapiter_trivial_get (self=0x7ffff57ef340,
ind=0x7ffff57ef1c0, result=0x7ffff57ef3c0)
at numpy/core/src/multiarray/lowlevel_strided_loops.c.src:1531
#2 0x00007ffff6503212 in array_subscript (self=0x7ffff57ef340,
op=<numpy.ndarray at remote 0x7ffff57ef1c0>)
at numpy/core/src/multiarray/mapping.c:1568
#3 0x00007ffff7affd01 in PyObject_GetItem (
o=<numpy.ndarray at remote 0x7ffff57ef340>,
key=<numpy.ndarray at remote 0x7ffff57ef1c0>) at Objects/abstract.c:158
mapiter_trivial_get is calling copyswap:
#1 0x00007ffff64fc7c4 in mapiter_trivial_get (self=0x7ffff57ef340,
ind=0x7ffff57ef1c0, result=0x7ffff57ef3c0)
at numpy/core/src/multiarray/lowlevel_strided_loops.c.src:1531
1531 copyswap(result_ptr, self_ptr, 0, self);
Ah the issue is because the dtype doesn't have a copyswap implementation:
(gdb) p copyswap
$8 = (PyArray_CopySwapFunc *) 0x0
@seberg should we make copyswap mandatory for dtypes? Or might there be a way to generically implement it so we could at least have a default implementation?
Ah that's right, numpy/numpy#23486 is the same issue. The copyswap will need to be replaced with a cast, like in raw_array_assign_scalar.
There may be a point of having a simpler single item copy function. But one thing is that for parametric dtypes, you really want that on the instance (the normal cast function like you did for repeat now is much faster for structured dtypes if used multiple times).
So in the end, I am not sure that "single item copy" function, is very different from using the normal cast function with n=1 or even passing the n=1 information in to specialize it even further (but not sure how useful that actually is)?
(Except if you call it once, so the setup is annoying, but with copyswap, its annoying that "swapping" is hardcoded and that it takes an array object for non-trivial dtypes.)
So I guess my opinion on this is:
- For NumPy, I would prefer to stop using
copyswapincluding (and maybe especially) here. The pattern you used forrepeatshould do well here, also (with the same tradeoff, structured would get much faster, object a tiny bit slower). - But... it may make sense to provide a default
copyswapfunction, although I doubt anyone uses it besides fordtype=objectin SciPy. (probably of limited practical use for parametric dtypes, and I am not sure about swapping either but I doubt it would be used much in practice.) - Although... if NumPy doesn't use copyswap anymore, a simple helper in hour headers that makes single item copy easy to set up, may be more useful then even bothering to support
copyswap?
So, for simple advanced indexing, I am seeing a slowdown of about 30% using the normal cast loop here rather than copyswap. Which feels a bit larger than what we saw in the fix for repeat (a somewhat different pattern admittedly).
I can reduce that to <10% by specializing the loop to a single element. One approach may be to change things everywhere to use 0 as the fixed stride (to be able to select a scalar version, copying can assume n==1 in that case). That might not be the best trade-off when a contig version exists but no 0-stride version, but then, for most relevant paths we should use the inlined small-size copy anyway.
(We could of course always inline/special case object for advanced indexing, the theoretical speed of a bare loop is about 15-20% faster on my computer than copyswap. In a sense that means that just using the strided loop makes it 50% slower than the theoretical best. Which honestly doesn't seem terrible...)
We could reduce it to more by passing fewer things in, I guess (no need for strides). But then we would diverge from the ufunc signature and either force providing the scalar version, or require another branch in places where we use it.
@ngoldbaum what do you think? Just take the 33% or slightly higher slowdown for now and use the strided loop? (I thought also about adding a contig special copy for objects, it only reduces things to 25% slowdown and has no effect on larger copies.)
Although... if NumPy doesn't use copyswap anymore, a simple helper in hour headers that makes single item copy easy to set up, may be more useful then even bothering to support copyswap?
I think we should provide a simple one-liner to replace downstream uses of copyswap. It's best to err on the side of downstream C API users not having a lot of bandwidth to do large-scale refactorings, even if there is a performance benefit. I think a function that simply hard-codes an n=1 cast and incurs the setup overhead is fine. We can also point to the new casting API if the copyswap is being done repeatedly unnecessarily.
Just take the 33% or slightly higher slowdown for now and use the strided loop?
I'm going to try to understand the rest of your comment today and play a bit with your numpy PR. Is the slowdown only for REFCHK dtypes? If so it's totally fine since this simple advanced indexing branch is broken so going from broken to slow is definitely an improvement. For object dtypes specifically is it feasible to leave in a branch that still does copyswap so we don't regress performance in that case? I think a 33% slowdown for object dtypes is the most concerning aspect of this.
Closing with the expectation that Sebastian's numpy PR will be merged.