`numpy.histogram` is returning an array of `int64`, which is breaking compatibility with bindings/C wrapper code
Closed this issue · 7 comments
Due to this change:
34b582a#diff-9d44603693b9544b900e135a203b3ae2R180
# Histogram is an integer or a float array depending on the weights.
if weights is None:
ntype = np.dtype(np.intp) # <---
else:
ntype = weights.dtype
numpy.histogram
is choosing intp
as the return type for the histogram values. This is breaking compatibility with wrapper code that expects simple int
s (int32
are fine, as are python long
, but usually C wrapper code can't handle numpy.int64
). What is worse is that the behaviour is platform-dependent (i.e., it keeps working on 32-bit platforms).
Could it be changed to simply use int
as dtype
(i.e., would use numpy.int32
)?
cc @astrofrog
In a 64 bit system you could have an input to histogram having more than 2**31 - 1
identical values, which would result in an int32
overflowing. So I think it is pretty much uncontroversial that the right thing to do is to go with np.intp
.
It's hard to tell without knowing more details, but the issue here seems to be that your wrapper code is not ready to work on 64 bit systems, and you will need to adapt it to fix that, not expect NumPy to sacrifice correctness for your convenience.
I'm not asking to sacrifice correctness: the case when there are more than 2**31 - 1
identical values seems to be so rare that the resulting array could be converted back to int32
whenever all its elements fit on int32
s (it would be a very minor performance sacrifice, at most).
It is not about the wrapper code being ready to work on 64-bit systems (it is already), it is about wrapper code being able to handle the numpy.int64
types. In this case, the wrapper code is generated using SIP, but the main problem is that integers are usually handled in C by standard Python C-API functions like PyLong_AsLong
. In order to understand the numpy.int64
, the wrapper would need to depend on NumPy C API now, and use a function like PyArray_ScalarAsCtype
.
I've already fixed this on user code side, but I've created the issue to let you know of this possible problem and open a discussion. If there are no other compatibility problems related to this change, then I'm ok if you want to close the issue.
@tadeu: not sure if this affects anything, but note for your reference that this problem only affects win64 -- on every other 32- or 64-bit platform, intp and int are the same.
You generally want to use PyNumber
API instead of the PyLong
API if you want to be able to use numpy scalar objects. You don't need to use the numpy API.
Closing. Please reopen if you think more needs to be done.