pearu/pylibtiff

SegFault on Mac OSX with M2 chip / Ventura 13.4 trying to open TIFF Files

dgutman opened this issue · 12 comments

I have version 4.6.0 of libtiff installed on my Mac OSX running Venture 13.4. In another application that ultimately uses pylibtiff, I started getting a SegFault. After some initial digging, it seems like pylibtiff is causing the issue, although a bit stumped as I've used the library in the past.

Simply running tiffinfo from the command line (see below), I am able to view the basic info from the sample TIFF File. I am running python 3.11.4 in a clean virtual environment, and

 import libtiff.libtiff_ctypes as lc
 f = lc.TIFF.open("WD84596_region_002_ACTG1.tif")
 f.GetField('bitspersample')

Throws a seg fault... even just doing f.info() throws a seg fault...

image image image

How did you install pylibtiff? You mentioned you've used pylibtiff before. What has changed or is different from when it worked in the past?

so originally the pylibtiff was installed as a dependency for one of my other packages. Its been several months though since I've probably done much testing, so I am not sure if I still have a virtualenvironment anywhere with a working local version...

In my scenario above though I was just creating a new virtual environment, and installed pylibtiff directly from github via pip install git+https://github.com/pearu/pylibtiff.git.

Check the logs of that pip install and make sure it was completed successful (the extensions are built successfully). I don't think the cython extensions access libtiff C directly so I'm not sure the Cython extensions are actually the problem. Oh yeah you're using the ctypes...hhmmm.

Yeah I'm going to try and dig a bit deeper today with various versions, I was just surprised I was getting a segfault which makes it harder for me to debug, a bit outside of my normal comfort zone. Just hadn't seen anyone report anything similar.

If you run python and a snippet of your code with gdb (the C debugger) or some mac equivalent of strace, you might get some information about what exactly is causing the segfault. For example, some missing library that the libtiff library is trying to link to.

Executive summary: I think this can be fixed by making sure that argtypes is specified with non-variadic arguments for all variadic functions being called via ctypes.

Details--
I fell down the rabbit hole of this a bit and have a preliminary fix based on this being a result of calling ABI differences between ARM64 and Intel for variadic functions.

I was able to reproduce the same (or similar) issue with a venv derived from a MacPorts py311 build on an M2max Sonoma (macOS14) system with tiff 4.6.0 also from MacPorts. In particular it's crashing with this stack trace:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libtiff.6.dylib                        0x106628c24 _TIFFVGetField + 1208
1   libtiff.6.dylib                        0x106627020 TIFFGetField + 28
2   libffi.8.dylib                         0x104fcc050 ffi_call_SYSV + 80
3   libffi.8.dylib                         0x104fc9548 ffi_call_int + 1432
4   _ctypes.cpython-311-darwin.so          0x104fa8860 _ctypes_callproc + 788
5   _ctypes.cpython-311-darwin.so          0x104fa34ac PyCFuncPtr_call + 220
6   Python                                 0x1051578d0 _PyObject_MakeTpCall + 128
7   Python                                 0x105235804 _PyEval_EvalFrameDefault + 41960
8   Python                                 0x10522a7d8 PyEval_EvalCode + 168
9   Python                                 0x10527d7bc run_eval_code_obj + 84
10  Python                                 0x10527d720 run_mod + 112
11  Python                                 0x10527d560 pyrun_file + 148
12  Python                                 0x10527cfb0 _PyRun_SimpleFileObject + 268
13  Python                                 0x10527c948 _PyRun_AnyFileObject + 216
14  Python                                 0x105299504 pymain_run_file_obj + 220
15  Python                                 0x105298e44 pymain_run_file + 72
16  Python                                 0x1052986f8 Py_RunMain + 660
17  Python                                 0x105299860 Py_BytesMain + 40
18  dyld                                   0x19eaf20e0 start + 2360

The crash registers include "byte write Translation fault":

Thread 0 crashed with ARM Thread State (64-bit):
    x0: 0x0000000106684680   x1: 0x0000000000000102   x2: 0x0000000000000000   x3: 0x000000010662876c
    x4: 0x0000000106629be8   x5: 0x000000011782143a   x6: 0x00000001054f2d98   x7: 0x0000000000000000
    x8: 0x0000000000000008   x9: 0x0000000000000000  x10: 0x000000016b696348  x11: 0x0000000106628964
   x12: 0x0000000000000064  x13: 0x0000000000000020  x14: 0x00000001053e5780  x15: 0x00000000ffff7dff
   x16: 0x000000019ed17b68  x17: 0x000000019ecac54c  x18: 0x0000000000000000  x19: 0x0000000106813a00
   x20: 0x0000000106684680  x21: 0x0000000000000102  x22: 0x0000000000000000  x23: 0x0000000000000000
   x24: 0x000000016b696538  x25: 0x0000000000000003  x26: 0x00000001071f4998  x27: 0x0000000000000003
   x28: 0x000000016b6964f0   fp: 0x000000016b696310   lr: 0x0000000106628794
    sp: 0x000000016b6962d0   pc: 0x0000000106628c24 cpsr: 0x80001000
   far: 0x0000000000000000  esr: 0x92000046 (Data Abort) byte write Translation fault

A bit of web searching around led me here: python/cpython#92892, which led to an update of the ctypes docs:

https://docs.python.org/3/library/ctypes.html#calling-variadic-functions

Digging around in the pylibtiff sources, I see that the GetField calling sequence isn't argtypes-defined since it's ultimately quite varied based on what's being getted:

libtiff.TIFFIsMSB2LSB.restype = ctypes.c_int
libtiff.TIFFIsMSB2LSB.argtypes = [TIFF]

# GetField and SetField arguments are dependent on the tag
libtiff.TIFFGetField.restype = ctypes.c_int

libtiff.TIFFSetField.restype = ctypes.c_int

libtiff.TIFFNumberOfStrips.restype = c_tstrip_t

Then, rereading the ctypes docs more closely:

On those platforms it is required to specify the argtypes attribute for the regular, non-variadic, function arguments:

So I then edited libtiff_ctypes.py in my venv to include the non-variadic arguments from looking at tiffio.h entries for TIFF*GetField.

# GetField and SetField arguments are dependent on the tag
libtiff.TIFFGetField.restype = ctypes.c_int
libtiff.TIFFGetField.argtypes = [TIFF, ctypes.c_uint32]

After that, my test program was able to GetField without segfault for things including:

print(f.GetField('BitsPerSample'))
print(f.GetField('ImageDescription'))
print(f.GetField('ImageWidth'))

I'm curious whether this also affects other arm64 systems like raspi4 but haven't gotten a viable build on the system I have available as of this writing.

Wow! Thanks for diving into the rabbit hole.. Always glad to know I wasn't just doing something fundamentally dumb on my end. I'm not familiar enough with libtiff to provide much additional insight. Would this be relatively easy to patch though in general, or will it require specifying argtypes for a huge number of parameters/functions.

I took a look through the code and it may only be two lines that need adding; and it should not impact compatibility with other architectures.

diff --git a/libtiff/libtiff_ctypes.py b/libtiff/libtiff_ctypes.py
index be13a3a..8e85346 100644
--- a/libtiff/libtiff_ctypes.py
+++ b/libtiff/libtiff_ctypes.py
@@ -1895,8 +1895,10 @@ libtiff.TIFFIsMSB2LSB.argtypes = [TIFF]
 
 # GetField and SetField arguments are dependent on the tag
 libtiff.TIFFGetField.restype = ctypes.c_int
+libtiff.TIFFGetField.argtypes = [TIFF, ctypes.c_uint32]
 
 libtiff.TIFFSetField.restype = ctypes.c_int
+libtiff.TIFFSetField.argtypes = [TIFF, ctypes.c_uint32]
 
 libtiff.TIFFNumberOfStrips.restype = c_tstrip_t
 libtiff.TIFFNumberOfStrips.argtypes = [TIFF]

I had tested your update to the library, and It had fixed some of the errors my app was throwing.. so progress has been made. I had primarily been trying to access the metadata using the pylibtiff library with the changes you had made, and no issue. I believe now the issue that is throwing the segfault relates to actually trying to get a tile/field from the tiff file.

image

Process 31728 stopped

  • thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000107c9a8d8 libtiff.6.dylibJPEGVGetField + 124 libtiff.6.dylibJPEGVGetField:
    -> 0x107c9a8d8 <+124>: str w9, [x10]
    0x107c9a8dc <+128>: ldr x8, [x8, #0x518]
    0x107c9a8e0 <+132>: ldr x9, [sp, #0x8]
    0x107c9a8e4 <+136>: add x10, x9, #0x8
    Target 0: (python) stopped.
    Process 31728 launched: '/Users/dagutman/devel/BDSA-Schema-Wrangler/.venv/bin/python' (arm64)

did't post the full stack trace..
image