PyO3/rust-numpy

PyArray1::<bool>::as_slice is unsound

Opened this issue · 5 comments

orlp commented

PyArray1::<bool>::as_slice returns a &[bool], but the underlying data may not at all be valid for Rust's bool, creating instant undefined behavior. For example

>>> a = np.uint8([24, 15, 32, 1, 0]).view(bool)
>>> a
array([ True,  True,  True,  True, False])

Calling as_slice on this from Rust reveals the invalid representation:

dbg!(arr.as_slice().unwrap());
dbg!(unsafe { core::mem::transmute::<&[bool], &[u8]>(arr.as_slice().unwrap()) });
[ false, true, false, true, false, ]
[ 24, 15, 32, 1, 0, ]

Thanks for reaching out. This looks unfortunate. I do wonder however how sound that Python implementation is in itself. Our bool array implementation is based on NPY_BOOL

impl_element_scalar!(bool => NPY_BOOL);

In its documentation it's pretty clearly mentioned that its byte values may only ever by 0 or 1, just like Rust's bool. Under this assumption I believe our implementation to be sound. And if we start with a boolean array this invariant will also be upheld:

>>> b = np.array([True, False])
>>> b.view(np.uint8)
array([1, 0], dtype=uint8)
>>> b[1] = 42
>>> b.view(np.uint8)
array([1, 1], dtype=uint8)

So view in some sense should be considered unsafe because it can be used to violate this assumption.

There are other functions that are problematic in this sense, for example numpy.empty. Accessing uninitialized memory without MaybeUninit is also unsound even if the type can hold any bit pattern. There is an implicit safety contract that the Python code is assumed to be well behaved, i.e. not hand us uninitialized memory or break type invariant. I don't think there is a practical way for us to guard against this (but I'm happy to hear otherwise)

orlp commented

It is unfortunate that np.view doesn't document what kind of views are 'allowed' and which would be undefined behavior. Then again, np.empty is essentially instant undefined behavior anywhere so there's a certain lack of care from the numpy project in this regardless.

I'm perfectly happy to chalk this up to np.view having an implicit unsafe bound. That said, I'll keep the the workaround I used in Polars so that we match numpy's behavior of 0 = false, 1+ = true, regardless of what the documentation says.

So view in some sense should be considered unsafe because it can be used to violate this assumption.

IMO rust-numpy should add checks like what OP proposes to the bindings for view and empty. Or mark them as unsafe.

orlp commented

@ngoldbaum The view isn't called from Rust code, that happens in Python. The Rust code just gets an ndarray and has no idea if it's in a valid state or not.

Ah, fair enough. Seems like another issue similar to what Alex Gaynor describes regarding the buffer protocol: https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/

Numpy could opt to add more checks to support Rust's safety needs, but that might break uses that don't use Rust or obey Rust's rules that rely on that unsafety.

It may be the case that rust-numpy can't safely assume the things it does about the sanity of data in numpy arrays and add runtime checks...