Toblerity/rtree

Apparent C API mismatch causes test failure on big-endian architecture (s390x)

musicinmybrain opened this issue · 13 comments

This comes from a downstream Fedora Linux bug report, in which tests/test_index.py::IndexSerialization::test_interleaving and tests/test_index.py::IndexStream::test_stream_input were failing on s390x (Fedora’s only big-endian primary architecture). This wasn’t noticed for a long time because the python-Rtree package is noarch and is normally built once on any available builder rather than once per architecture.

The submitter concluded:

I think I found the bug here - the size_t is getting chopped in half on 64-bit platforms, and that matters more for big endian machines.

I think the prototypes are wrong for size_t values in other signatures, but this seems to be the only one causing test failures. Maybe it only matters if it's a size_t pointer - I'm not sure.

They also pointed out that the change was intentional, #77, and claimed to fix a mismatch—but the callback signature in question appears to have always had a size_t * as its last parameter in libspatialindex.

The following change fixed the test failure:

diff -Naur Rtree-0.9.7-original/rtree/core.py Rtree-0.9.7/rtree/core.py
--- Rtree-0.9.7-original/rtree/core.py	2020-12-24 07:38:19.000000000 -0800
+++ Rtree-0.9.7/rtree/core.py	2022-02-16 16:11:36.115401345 -0800
@@ -99,7 +99,7 @@
                             ctypes.POINTER(ctypes.POINTER(ctypes.c_double)),
                             ctypes.POINTER(ctypes.c_uint32),
                             ctypes.POINTER(ctypes.POINTER(ctypes.c_ubyte)),
-                            ctypes.POINTER(ctypes.c_uint32))
+                            ctypes.POINTER(ctypes.c_size_t))
 
 rt.Index_CreateWithStream.argtypes = [ctypes.c_void_p, NEXTFUNC]
 rt.Index_CreateWithStream.restype = ctypes.c_void_p

However, the submitter noted that there seemed to be other prototype mismatches with libspatialindex’s C API. I audited the wrapped functions and will report my findings in comments on this issue.

Of the Error_… functions, the following are exported in the shared library, but have no prototype in any API header in libspatialindex. (Only Error_GetLastErrorMsg is in the headers.) I’m not sure if this is an oversight or intentional in libspatialindex. It’s still safe enough to wrap them as long as their signatures remain stable in practice, of course.

  • Error_GetLastErrorNum
  • Error_GetLastErrorMethod
  • Error_GetErrorCount
  • Error_Reset

Of these, all are (void) in C, but only some set […].argtypes = []. While the prototypes without this are not wrong, it’s akin to declaring () instead of (void) in C. it seems like it would be worthwhile to add

rt.Error_GetLastErrorNum.argtypes = []

and

rt.Error_GetLastErrorMethod.argtypes = []

to make these prototypes stricter.

It seems like the use of c_void_p instead of explicitly defining struct Tools_PropertySet/struct IndexS and the typedefs IndexPropertyH/IndexH is an intentional design decision. There should be no problem with type erasure here as long as the pointers point to the correct types in practice.

As noted in the beginning of this report, the last parameter of

NEXTFUNC = ctypes.CFUNCTYPE(ctypes.c_int,
                            ctypes.POINTER(ctypes.c_int64),
                            ctypes.POINTER(ctypes.POINTER(ctypes.c_double)),
                            ctypes.POINTER(ctypes.POINTER(ctypes.c_double)),
                            ctypes.POINTER(ctypes.c_uint32),
                            ctypes.POINTER(ctypes.POINTER(ctypes.c_ubyte)),
                            ctypes.POINTER(ctypes.c_uint32))

is uint32_t, which does not match size_t in the C header in the callback function pointer definition for Index_CreateWithStream:

int (*readNext)(int64_t *id, double **pMin, double **pMax, uint32_t *nDimension, const uint8_t **pData, size_t *nDataLength)

Similarly, with value parameters instead of pointer ones,

rt.SIDX_NewBuffer.argtypes = [ctypes.c_uint]

does not match

SIDX_C_DLL void* SIDX_NewBuffer(size_t bytes);

and

    rt.Index_InsertTPData.argtypes = [ctypes.c_void_p,
                                      ctypes.c_int64,
                                      ctypes.POINTER(ctypes.c_double),
                                      ctypes.POINTER(ctypes.c_double),
                                      ctypes.POINTER(ctypes.c_double),
                                      ctypes.POINTER(ctypes.c_double),
                                      ctypes.c_double,
                                      ctypes.c_double,
                                      ctypes.c_uint32,
                                      ctypes.POINTER(ctypes.c_ubyte),
                                      ctypes.c_uint32]

differs in the last parameter’s type (uint32_t vs. size_t) from

SIDX_C_DLL RTError Index_InsertTPData( IndexH index,
  int64_t id,
  double* pdMin,
  double* pdMax,
  double* pdVMin,
  double* pdVMax,
  double tStart,
  double tEnd,
  uint32_t nDimension,
  const uint8_t* pData,
  size_t nDataLength);

In rt.IndexProperty_SetIndexType, ctypes.c_int32 is used for enum RTIndexType, but elsewhere enums are generally wrapped as ctypes.c_int—including RTError in the same functions, and even for enum RTIndexType in rt.IndexProperty_GetIndexType. This is probably of no consequence on common platforms, since sizeof(int) == 4 is generally true.

In rt.IndexProperty_SetIndexVariant and IndexProperty_SetIndexStorage, ctypes.c_uint32 is used for enum RTIndexVariant and enum RTStorageType. This may or may not work in practice for nonnegative enum values, but it doesn’t seem to be technically correct. Again, ctypes.c_int is used for the same enums in the corresponding getters.

rt.IndexProperty_GetDimension.restype = ctypes.c_int

does not match

SIDX_DLL uint32_t IndexProperty_GetDimension(IndexPropertyH iprop);

and the following have the same problem:

  • IndexProperty_GetPagesize
  • IndexProperty_GetIndexCapacity
  • IndexProperty_GetLeafCapacity
  • IndexProperty_GetLeafPoolCapacity
  • IndexProperty_GetIndexPoolCapacity
  • IndexProperty_GetRegionPoolCapacity
  • IndexProperty_GetPointPoolCapacity
  • IndexProperty_GetBufferingCapacity
  • IndexProperty_GetEnsureTightMBRs
  • IndexProperty_GetOverwrite
  • IndexProperty_GetNearMinimumOverlapFactor
  • IndexProperty_GetWriteThrough

The definition of rt.Index_TPIntersects_count is incomplete; the argtypes are set, but these lines are missing:

    rt.Index_TPIntersects_count.restype = ctypes.c_int
    rt.Index_TPIntersects_count.errcheck = check_return

After allowing some time for comments, I’m happy to follow up with one or more PR’s for some or all of the things I noticed.

hobu commented

We would love PRs on these issues. The discontinuities in definitions here represent years of libspatialindex and Rtree tick-tock, and because there's no type checking on ctypes, things like this can slip through.

Please make PRs to address these issues to ensure a minimum version of libspatialindex 1.8.5 #214 and we'll be happy to merge them.

If you are really ambitious, I would love for someone to convert this from ctypes to pybind11 so these kinds of issues are compilation errors instead of silent failures.

Great! I’ll plan to begin by correcting the ctypes wrapper. Do you have a preference for separate PR’s, or one PR with multiple commits? I’d lean toward the latter if it doesn’t matter to you.

After that, I will probably take a look at pybind11 when I have a little time. It shouldn’t be too much of an undertaking.

hobu commented

A big one is just fine.

We are about ready to release Rtree after all of @adamjstewart recent commits. We will plan to include your ctypes fixes in that release, and then if you want to explore the pybind11 refactor, we can do that in the next version.

The nice thing about pytypes11 is @adamjstewart worries about type hinting that codebase can fall away.

Please make PRs to address these issues to ensure a minimum version of libspatialindex 1.8.5 #214 and we'll be happy to merge them.

Shall I add a runtime version test using

SIDX_C_DLL char* SIDX_Version(void);

to finder.py?

If so, can I expect that an acceptable version string is always of the form X.Y.Z with X, Y, and Z decimal integers? If this is expected to be true now and in the future, but wasn’t true at some point in the past, I can always treat deviations from this pattern as unacceptable versions.

Using packaging.version.parse would also be an easy option here, although it would add a dependency.

hobu commented

Shall I add a runtime version test using

I think it would be very rare that anyone would be using anything older than 1.8.5 at this point. I don't know if the runtime test is needed.

The reason the tests are failing for #222 appears to be that, prior to 1.9.0, the callback signature in the implementation for Index_CreateWithStream in libspatialite didn’t match the prototype in the headers. So even though size_t* seems to have always been the last parameter’s type in the prototype, the compiled code is treating it as uint32_t*.
Now I understand why the type in the ctypes wrapper was changed to disagree with the header!

Here is the commit where that was corrected: libspatialindex/libspatialindex@6d139aa

Since this makes libspatialite 1.8.5 and libspatialite 1.9.x ABI-incompatible (also reflected in the fact that the shared library so-version has been bumped twice since 1.8.5), I think it will be necessary for the bindings to define prototypes conditionally based on runtime version detection if they must support libspatialite versions older than 1.9.0.

(Presumably if we migrate to pybind11 we will have to support only the latest ABI version, i.e. 1.9.1 or later.)