numpy/numpy-user-dtypes

`PyArray_Item_XDECREF` fails with user dtypes that have `NPY_ITEM_REFCOUNT` flag

peytondmurray opened this issue · 4 comments

When calling arr.fill() on an array of dtype StringDType, a failed assertion causes a coredump. Here's a reproducer:

import numpy as np
from stringdtype import StringDType

x = np.array(['a', 'b', 'c'], dtype=StringDType())

x.fill('f')
print(x)

This happens because when x.fill('f') is called, array_fill is executed, which calls PyArray_FillWithScalar, which tries to XDECREF all objects in the object that fills the array using PyArray_Item_XDECREF. This fails because dtypes of different varieties are handled differently, and type_num is used to check what kind of dtype the input is. There's no case for when the dtype is a user dtype, and a safety assertion is triggered, stopping execution.

In short we need to handle the case when user dtypes with the NPY_ITEM_REFCOUNT are passed into PyArray_Item_XDECREF. I guess in principle this issue belongs on the numpy repo, so I can move it there if there's not a workaround.

Yes, my preferred fix for this will be to remove all occurances of PyArray_Item_XDECREF one by one. Fill is an important one I guess.

Looking at this today.

I noticed that there is a similar assert(0) in PyArray_Item_INCREF, what should we do about that case? Should we have e.g. a NPT_DT_get_incref_loop API slot for user dtypes? I don't need this for stringdtype, since it only makes sense to do increfs for dtypes holding references to python objects. I could also make the assert(0) only trigger for legacy dtypes, but then that's leaving a footgun for someone writing a dtype in the future that holds references to python objects.

My hope on that front is that the use of PyArray_Item_INCREF is always an anti-pattern. Someone is copying data and only then incref'ing the reference in that data. That works well for reference counted Python, but otherwise it gets tricky and it may also translate well to e.g. HPy in any case.

So, I think we should try to refactor out all uses of INCREF.

Closing as numpy/numpy#23484 is merged.