numpy/numpy

`numpy.histogram` is returning an array of `int64`, which is breaking compatibility with bindings/C wrapper code

Closed this issue · 7 comments

tadeu commented

Due to this change:

34b582a#diff-9d44603693b9544b900e135a203b3ae2R180

    # Histogram is an integer or a float array depending on the weights.
    if weights is None:
        ntype = np.dtype(np.intp)  # <---
    else:
        ntype = weights.dtype

numpy.histogram is choosing intp as the return type for the histogram values. This is breaking compatibility with wrapper code that expects simple ints (int32 are fine, as are python long, but usually C wrapper code can't handle numpy.int64). What is worse is that the behaviour is platform-dependent (i.e., it keeps working on 32-bit platforms).

Could it be changed to simply use int as dtype (i.e., would use numpy.int32)?

In a 64 bit system you could have an input to histogram having more than 2**31 - 1 identical values, which would result in an int32 overflowing. So I think it is pretty much uncontroversial that the right thing to do is to go with np.intp.

It's hard to tell without knowing more details, but the issue here seems to be that your wrapper code is not ready to work on 64 bit systems, and you will need to adapt it to fix that, not expect NumPy to sacrifice correctness for your convenience.

tadeu commented

I'm not asking to sacrifice correctness: the case when there are more than 2**31 - 1 identical values seems to be so rare that the resulting array could be converted back to int32 whenever all its elements fit on int32s (it would be a very minor performance sacrifice, at most).

It is not about the wrapper code being ready to work on 64-bit systems (it is already), it is about wrapper code being able to handle the numpy.int64 types. In this case, the wrapper code is generated using SIP, but the main problem is that integers are usually handled in C by standard Python C-API functions like PyLong_AsLong. In order to understand the numpy.int64, the wrapper would need to depend on NumPy C API now, and use a function like PyArray_ScalarAsCtype.

tadeu commented

I've already fixed this on user code side, but I've created the issue to let you know of this possible problem and open a discussion. If there are no other compatibility problems related to this change, then I'm ok if you want to close the issue.

@tadeu: not sure if this affects anything, but note for your reference that this problem only affects win64 -- on every other 32- or 64-bit platform, intp and int are the same.

rkern commented

You generally want to use PyNumber API instead of the PyLong API if you want to be able to use numpy scalar objects. You don't need to use the numpy API.

Closing. Please reopen if you think more needs to be done.