Ensure proper repair of wheels
agriyakhetarpal opened this issue · 26 comments
I notice that the wheels currently have hardcoded platform tags that do not check for GLIBC version constraints or the macOS version constraints, and therefore might not be repaired fully, i.e., additional tags can be added for the Linux wheels and macOS wheels can better reflect the minimum required OSX version.
A quick experiment with delocate locally led me to find out that the minum required macOS version is macOS X 11.7 for both M-series and Intel wheels, and not 11.0 and 10.9 respectively. On my local machine (macOS), it is not possible to use auditwheel because of hard-coded checks to stop usage outside of Linux platforms, therefore, I tested using the cross-platform alternative repairwheel, which granted additional tags to some of the Linux wheels:
- The Linux x86_64 wheel additionally conforms to GLIBC 2.5, i.e., manylinux_2_5 (even though it's been EOLed for some while12 now). The wheel is correctly marked to have no additional dependencies, being a statically linked binary.
- This was similar for the Linux i686 wheel.
- For aarch64 and armv7l, it added the
linux_1_1_aarch64and thelinux_2_17_armv7ltags for some reason, which won't be accepted on PyPI. I think that is a bug onrepairwheel's side.auditwheelmight be more robust in this case.
For Windows wheels, delvewheel showcased that no external DLLs are required or need to be mangled, as expected, and did not modify the wheel.
I would like to propose that an additional step in the make_wheels.py file after the wheels have been downloaded can do this action, and repairwheel can be added as a dependency in pyproject.toml. It would also make sense to provide an option through the command-line argument parser to not repair the wheels if needed, but I would argue that it's most likely better to have repaired wheels by default. Owing to how the binary inside the wheel can change during linkage, I would recommend this as an additional step instead of hardcoding the new tags in the values of the ZIG_PYTHON_PLATFORMS dictionary (those can then remain the same, or be updated).
Footnotes
I would like to propose that an additional step in the
make_wheels.pyfile after the wheels have been downloaded can do this action, andrepairwheelcan be added as a dependency inpyproject.toml.
This is broadly speaking desirable but I have some implementation concerns. If you prototype this I can look into it.
Sure! I had a revelation – instead of making the make_wheels.py script longer, it would be much better to set up some CI in this repository and use actual GitHub Actions runners, since repairwheel does not have a stable v1 release. I would be happy to do so for all three platforms, additional architectures won't bring an issue when repairing wheels as long as the platform remains the same.
So, would you be okay with GitHub-hosted CI, and with the general proposition to add CI to this repository?
No, I don't think that will work, because it's important that wheels uploaded to PyPI are reproducible by anyone who wants, with the exact same SHA checksum.
So if we use repairwheel it needs to be fixed to a specific commit that is checked into the repo, either using PDM lockfiles, or just a normal pyproject.toml dependency.
As for CI, I'm not concerned about having it, and I don't want it to become a maintenance overhead.
Also, repairwheel must produce exact same, bit-for-bit, output for the exact same input, or it can't be used at all.
Also, repairwheel must produce exact same, bit-for-bit, output for the exact same input, or it can't be used at all.
As far as I understand, repairwheel will produce a deterministic output every time given any input files.
No, I don't think that will work, because it's important that wheels uploaded to PyPI are reproducible by anyone who wants, with the exact same SHA checksum.
This will be much more difficult, however... since all four of these tools, on any platform, will do wheel unpack, copy additional files into the wheel (not here, though), and rename them to add more platform tags the wheel is compatible with:
shasum wheelhouse/ziglang-0.12.0-py3-none-macosx_10_9_x86_64.whl
73ad7c6edcc8e04cd5a819130b3c230d11bd87b3 wheelhouse/ziglang-0.12.0-py3-none-macosx_10_9_x86_64.whl
and
shasum dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64.whl
ac2b140d30c409beebd6438f64eb0ec59927ebaa dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64.whl
i.e., the macOS wheels before and after have different checksums, and this shall similarly be so for Linux. I checked by renaming the file into the previous name again, and the checksum remained the same as the newly modified one, so, the wheel was definitely unzipped and zipped back again during repair.
Adding CI indeed brings maintenance overhead. One option that we have still remains, and should work for all of the cases, is hardcoding the new tags (at least the delocate ones are correct for macOS, I'll have to run this in CI on my fork to confirm the Linux ones, and the Windows wheels don't need them). I assume from ziglang/zig#14542 that Zig is making three or four releases a year at most, so manually verifying whether the hardcoded tags are correct will be a seasonal effort for maintainers here (but I will be happy to verify them, too, if you wish to ping me). Renaming the wheels is essentially the same as repairing them, in this case, thanks to the static binaries.
repairwheel is matching outputs with auditwheel, no worries there (my previous complaint was wrong) but auditwheel has another bug, where it's adding the manylinux_2_5_x86_64 and manylinux1_x86_64 platform tags to the
- the
i386-linux x86-linuxaarch64-linuxarmv7a-linux
wheels, which seems all incorrect to me – I'll file an issue on their tracker.
The workflow run, here: https://github.com/agriyakhetarpal/zig-pypi/actions/runs/8841793735 (PR linked above) has the required output, so I think this should be the diff (the i386-linux and x86-linux keys are already correct with MUSL 1.1 and GLIBC 2.12, and x86_64-linux can receive extra tags):
ZIG_PYTHON_PLATFORMS = {
'x86_64-windows': 'win_amd64',
'x86-windows': 'win32',
- 'x86_64-macos': 'macosx_10_9_x86_64',
- 'aarch64-macos': 'macosx_11_0_arm64',
+ 'x86_64-macos': 'macosx_11_7_x86_64',
+ 'aarch64-macos': 'macosx_11_7_arm64',
'i386-linux': 'manylinux_2_12_i686.manylinux2010_i686.musllinux_1_1_i686',
# renamed i386 to x86 since v0.11.0, i386 was last supported in v0.10.1
'x86-linux': 'manylinux_2_12_i686.manylinux2010_i686.musllinux_1_1_i686',
- 'x86_64-linux': 'manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64',
+ 'x86_64-linux': 'manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64',
'aarch64-linux':
'manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64',
'armv7a-linux': 'manylinux_2_17_armv7l.manylinux2014_armv7l.musllinux_1_1_armv7l',While I may not be totally sure about Linux, with my macOS machine I can confirm whether delocate is correct with the macOS tags, and yes, it is:
Here's what otool says:
macOS arm64 wheel
$ otool -l dist/ziglang-0.12.0-py3-none-macosx_11_7_arm64/ziglang/zigdist/ziglang-0.12.0-py3-none-macosx_11_7_arm64/ziglang/zig:
Load command 0
cmd LC_SEGMENT_64
cmdsize 72
segname __PAGEZERO
vmaddr 0x0000000000000000
vmsize 0x0000000100000000
fileoff 0
filesize 0
maxprot 0x00000000
initprot 0x00000000
nsects 0
flags 0x0
Load command 1
cmd LC_SEGMENT_64
cmdsize 792
segname __TEXT
vmaddr 0x0000000100000000
vmsize 0x00000000082cf5e4
fileoff 0
filesize 137164260
maxprot 0x00000005
initprot 0x00000005
nsects 9
flags 0x0
Section
sectname __text
segname __TEXT
addr 0x00000001000008f8
size 0x0000000004dc8384
offset 2296
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000400
reserved1 0
reserved2 0
Section
sectname __stubs
segname __TEXT
addr 0x0000000104dc8c7c
size 0x0000000000002fa0
offset 81562748
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000408
reserved1 0 (index into indirect symbol table)
reserved2 12 (size of stubs)
Section
sectname __stub_helper
segname __TEXT
addr 0x0000000104dcbc1c
size 0x0000000000000c54
offset 81574940
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000400
reserved1 0
reserved2 0
Section
sectname __const
segname __TEXT
addr 0x0000000104dcc870
size 0x00000000029b6170
offset 81578096
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __cstring
segname __TEXT
addr 0x00000001077829e0
size 0x00000000003c973b
offset 125315552
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000002
reserved1 0
reserved2 0
Section
sectname __gcc_except_tab
segname __TEXT
addr 0x0000000107b4c11c
size 0x0000000000001bdc
offset 129286428
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __ustring
segname __TEXT
addr 0x0000000107b4dcf8
size 0x0000000000000142
offset 129293560
align 2^1 (2)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __unwind_info
segname __TEXT
addr 0x0000000107b4de3c
size 0x0000000000139604
offset 129293884
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __eh_frame
segname __TEXT
addr 0x0000000107c87440
size 0x00000000006481a4
offset 130577472
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Load command 2
cmd LC_SEGMENT_64
cmdsize 232
segname __DATA_CONST
vmaddr 0x00000001082d0000
vmsize 0x000000000000a9f0
fileoff 137166848
filesize 43504
maxprot 0x00000003
initprot 0x00000003
nsects 2
flags 0x10
Section
sectname __got
segname __DATA_CONST
addr 0x00000001082d0000
size 0x0000000000009738
offset 137166848
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000006
reserved1 1016 (index into indirect symbol table)
reserved2 0
Section
sectname __mod_init_func
segname __DATA_CONST
addr 0x00000001082d9738
size 0x00000000000012b8
offset 137205560
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000009
reserved1 0
reserved2 0
Load command 3
cmd LC_SEGMENT_64
cmdsize 712
segname __DATA
vmaddr 0x00000001082dc000
vmsize 0x00000000005de948
fileoff 137216000
filesize 5629748
maxprot 0x00000003
initprot 0x00000003
nsects 8
flags 0x0
Section
sectname __la_symbol_ptr
segname __DATA
addr 0x00000001082dc000
size 0x0000000000001fc0
offset 137216000
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000007
reserved1 5855 (index into indirect symbol table)
reserved2 0
Section
sectname __const
segname __DATA
addr 0x00000001082ddfc0
size 0x000000000047e348
offset 137224128
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __data
segname __DATA
addr 0x000000010875c310
size 0x00000000000de2d0
offset 141935376
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __thread_vars
segname __DATA
addr 0x000000010883a5e0
size 0x0000000000000150
offset 142845408
align 2^0 (1)
reloff 0
nreloc 0
flags 0x00000013
reserved1 0
reserved2 0
Section
sectname __thread_data
segname __DATA
addr 0x000000010883a730
size 0x0000000000000004
offset 142845744
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000011
reserved1 0
reserved2 0
Section
sectname __thread_bss
segname __DATA
addr 0x000000010883a738
size 0x00000000000001e8
offset 0
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000012
reserved1 0
reserved2 0
Section
sectname __bss
segname __DATA
addr 0x000000010883a920
size 0x0000000000077728
offset 0
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000001
reserved1 0
reserved2 0
Section
sectname __common
segname __DATA
addr 0x00000001088b2050
size 0x00000000000088f8
offset 0
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000001
reserved1 0
reserved2 0
Load command 4
cmd LC_SEGMENT_64
cmdsize 72
segname __LINKEDIT
vmaddr 0x00000001088bc000
vmsize 0x0000000001dfc000
fileoff 142852096
filesize 31429984
maxprot 0x00000001
initprot 0x00000001
nsects 0
flags 0x0
Load command 5
cmd LC_DYLD_INFO_ONLY
cmdsize 48
rebase_off 142852096
rebase_size 182880
bind_off 143034976
bind_size 312
weak_bind_off 143035288
weak_bind_size 312248
lazy_bind_off 143347536
lazy_bind_size 4984
export_off 143352520
export_size 2924208
Load command 6
cmd LC_FUNCTION_STARTS
cmdsize 16
dataoff 146276728
datasize 0
Load command 7
cmd LC_DATA_IN_CODE
cmdsize 16
dataoff 146276728
datasize 0
Load command 8
cmd LC_SYMTAB
cmdsize 24
symoff 146276728
nsyms 226954
stroff 149935480
strsize 24006729
Load command 9
cmd LC_DYSYMTAB
cmdsize 80
ilocalsym 0
nlocalsym 179448
iextdefsym 179448
nextdefsym 47201
iundefsym 226649
nundefsym 305
tocoff 0
ntoc 0
modtaboff 0
nmodtab 0
extrefsymoff 0
nextrefsyms 0
indirectsymoff 149907992
nindirectsyms 6871
extreloff 0
nextrel 0
locreloff 0
nlocrel 0
Load command 10
cmd LC_LOAD_DYLINKER
cmdsize 32
name /usr/lib/dyld (offset 12)
Load command 11
cmd LC_MAIN
cmdsize 24
entryoff 71432336
stacksize 33554432
Load command 12
cmd LC_SOURCE_VERSION
cmdsize 16
version 0.0
Load command 13
cmd LC_BUILD_VERSION
cmdsize 32
platform 1
minos 11.7.1
sdk 14.0
ntools 1
tool 5
version 0.0
Load command 14
cmd LC_UUID
cmdsize 24
uuid 8190EC2C-A7F2-3ECE-B209-55605CBB2084
Load command 15
cmd LC_LOAD_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 2 Thu Jan 1 05:30:02 1970
current version 1336.0.0
compatibility version 1.0.0
Load command 16
cmd LC_CODE_SIGNATURE
cmdsize 16
dataoff 173942224
datasize 339856
macOS amd64 wheel
$ otool -l dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64/ziglang/zigreturns
dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64/ziglang/zig:
Load command 0
cmd LC_SEGMENT_64
cmdsize 72
segname __PAGEZERO
vmaddr 0x0000000000000000
vmsize 0x0000000100000000
fileoff 0
filesize 0
maxprot 0x00000000
initprot 0x00000000
nsects 0
flags 0x0
Load command 1
cmd LC_SEGMENT_64
cmdsize 792
segname __TEXT
vmaddr 0x0000000100000000
vmsize 0x00000000088e8eb0
fileoff 0
filesize 143560368
maxprot 0x00000005
initprot 0x00000005
nsects 9
flags 0x0
Section
sectname __text
segname __TEXT
addr 0x0000000100000940
size 0x0000000005ad32b5
offset 2368
align 2^6 (64)
reloff 0
nreloc 0
flags 0x80000400
reserved1 0
reserved2 0
Section
sectname __stubs
segname __TEXT
addr 0x0000000105ad3bf6
size 0x000000000000181e
offset 95239158
align 2^1 (2)
reloff 0
nreloc 0
flags 0x80000408
reserved1 0 (index into indirect symbol table)
reserved2 6 (size of stubs)
Section
sectname __stub_helper
segname __TEXT
addr 0x0000000105ad5414
size 0x0000000000000a42
offset 95245332
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000400
reserved1 0
reserved2 0
Section
sectname __const
segname __TEXT
addr 0x0000000105ad5e80
size 0x000000000282c190
offset 95248000
align 2^6 (64)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __cstring
segname __TEXT
addr 0x0000000108302010
size 0x00000000003c8ecb
offset 137371664
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000002
reserved1 0
reserved2 0
Section
sectname __gcc_except_tab
segname __TEXT
addr 0x00000001086caedc
size 0x0000000000001b1c
offset 141340380
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __ustring
segname __TEXT
addr 0x00000001086cca00
size 0x000000000000017c
offset 141347328
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __unwind_info
segname __TEXT
addr 0x00000001086ccb7c
size 0x00000000000ef430
offset 141347708
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __eh_frame
segname __TEXT
addr 0x00000001087bbfb0
size 0x000000000012cf00
offset 142327728
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Load command 2
cmd LC_SEGMENT_64
cmdsize 232
segname __DATA_CONST
vmaddr 0x00000001088e9000
vmsize 0x0000000000002828
fileoff 143560704
filesize 10280
maxprot 0x00000003
initprot 0x00000003
nsects 2
flags 0x10
Section
sectname __got
segname __DATA_CONST
addr 0x00000001088e9000
size 0x0000000000001570
offset 143560704
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000006
reserved1 1029 (index into indirect symbol table)
reserved2 0
Section
sectname __mod_init_func
segname __DATA_CONST
addr 0x00000001088ea570
size 0x00000000000012b8
offset 143566192
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000009
reserved1 0
reserved2 0
Load command 3
cmd LC_SEGMENT_64
cmdsize 792
segname __DATA
vmaddr 0x00000001088ec000
vmsize 0x00000000005df750
fileoff 143572992
filesize 5595428
maxprot 0x00000003
initprot 0x00000003
nsects 9
flags 0x0
Section
sectname __la_symbol_ptr
segname __DATA
addr 0x00000001088ec000
size 0x0000000000002028
offset 143572992
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000007
reserved1 1715 (index into indirect symbol table)
reserved2 0
Section
sectname __const
segname __DATA
addr 0x00000001088ee030
size 0x0000000000474a18
offset 143581232
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __data
segname __DATA
addr 0x0000000108d62a50
size 0x00000000000df158
offset 148253264
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __static_data
segname __DATA
addr 0x0000000108e41bc0
size 0x0000000000000410
offset 149167040
align 2^6 (64)
reloff 0
nreloc 0
flags 0x00000000
reserved1 0
reserved2 0
Section
sectname __thread_vars
segname __DATA
addr 0x0000000108e41fd0
size 0x0000000000000150
offset 149168080
align 2^0 (1)
reloff 0
nreloc 0
flags 0x00000013
reserved1 0
reserved2 0
Section
sectname __thread_data
segname __DATA
addr 0x0000000108e42120
size 0x0000000000000004
offset 149168416
align 2^2 (4)
reloff 0
nreloc 0
flags 0x00000011
reserved1 0
reserved2 0
Section
sectname __thread_bss
segname __DATA
addr 0x0000000108e42128
size 0x00000000000001e8
offset 0
align 2^3 (8)
reloff 0
nreloc 0
flags 0x00000012
reserved1 0
reserved2 0
Section
sectname __bss
segname __DATA
addr 0x0000000108e42310
size 0x0000000000080298
offset 0
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000001
reserved1 0
reserved2 0
Section
sectname __common
segname __DATA
addr 0x0000000108ec25b0
size 0x00000000000091a0
offset 0
align 2^4 (16)
reloff 0
nreloc 0
flags 0x00000001
reserved1 0
reserved2 0
Load command 4
cmd LC_SEGMENT_64
cmdsize 72
segname __LINKEDIT
vmaddr 0x0000000108ecc000
vmsize 0x0000000001da9000
fileoff 149172224
filesize 31100415
maxprot 0x00000001
initprot 0x00000001
nsects 0
flags 0x0
Load command 5
cmd LC_DYLD_INFO_ONLY
cmdsize 48
rebase_off 149172224
rebase_size 183464
bind_off 149355688
bind_size 328
weak_bind_off 149356016
weak_bind_size 133024
lazy_bind_off 149489040
lazy_bind_size 5024
export_off 149494064
export_size 2924536
Load command 6
cmd LC_FUNCTION_STARTS
cmdsize 16
dataoff 152418600
datasize 0
Load command 7
cmd LC_DATA_IN_CODE
cmdsize 16
dataoff 152418600
datasize 65600
Load command 8
cmd LC_SYMTAB
cmdsize 24
symoff 152484200
nsyms 231136
stroff 156193352
strsize 24079287
Load command 9
cmd LC_DYSYMTAB
cmdsize 80
ilocalsym 0
nlocalsym 183611
iextdefsym 183611
nextdefsym 47220
iundefsym 230831
nundefsym 305
tocoff 0
ntoc 0
modtaboff 0
nmodtab 0
extrefsymoff 0
nextrefsyms 0
indirectsymoff 156182376
nindirectsyms 2744
extreloff 0
nextrel 0
locreloff 0
nlocrel 0
Load command 10
cmd LC_LOAD_DYLINKER
cmdsize 32
name /usr/lib/dyld (offset 12)
Load command 11
cmd LC_MAIN
cmdsize 24
entryoff 82274208
stacksize 33554432
Load command 12
cmd LC_SOURCE_VERSION
cmdsize 16
version 0.0
Load command 13
cmd LC_BUILD_VERSION
cmdsize 32
platform 1
minos 11.7.1
sdk 14.0
ntools 1
tool 5
version 0.0
Load command 14
cmd LC_UUID
cmdsize 24
uuid EE6641C7-D65A-3EB5-9A0A-1B119D63ACC9
Load command 15
cmd LC_LOAD_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 2 Thu Jan 1 05:30:02 1970
current version 1336.0.0
compatibility version 1.0.0
Therefore, both of them were likely built on a macOS 14 Sonoma device with MACOSX_DEPLOYMENT_TARGET set to 11.7.1, and are therefore incompatible with ABIs for OSX 10.9 and 11.0, as noted in minos and sdk sections for the thirteenth load command.
I'll revise this in a PR for now – so, it's up to you if you wish to keep this issue open, since a restricted set of platform tags provides benefits in other ways, such as lesser problems for those with older Linux systems (because they won't be able to download the wheel at all). I'm by no means an expert on debugging GLIBC compatibility issues, and a very novice user of Zig, so I can only answer as much about the Python packaging ecosystem more.
because they won't be able to download the wheel at all
I think what will happen is that unless we yank and re-publish all the wheels (which raises questions about whether Zig has changed the deployment target--I suspect it did) it'll just result in an older wheel being downloaded.
This will be much more difficult, however... since all four of these tools, on any platform, will do
wheel unpack, copy additional files into the wheel (not here, though), and rename them to add more platform tags the wheel is compatible with:
As long as this process is completely deterministic that's fine, but if it's not I'm not going to use it since it breaks reproducibility.
so manually verifying whether the hardcoded tags are correct will be a seasonal effort for maintainers here
I don't have time to do this but I can ping you about it.
As long as this process is completely deterministic that's fine, but if it's not I'm not going to use it since it breaks reproducibility
Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use repairwheel or delocate.
I don't have time to do this but I can ping you about it.
Much appreciated :) I am happy to help out with every new release of Zig.
Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use
repairwheelordelocate.
I suspected this would be the case if something unpacks and repacks the archive, since most people are quite blasé about implementing that and don't care much about timestamps or file ordering.
I suspected this would be the case if something unpacks and repacks the archive, since most people are quite blasé about implementing that and don't care much about timestamps or file ordering.
Yes, it's most certainly doing that – in cases where libraries are needed to be vendored into the wheel, it has to be unzipped and a hidden .dylibs folder gets created inside it where all of them are stored (and this is by default wheel.libs for Windows). However, I did have an idea for cases like the Zig wheels here, I wonder if it would be possible to do a three-way copy, or a dry-run mode, i.e.,
- Duplicate the wheel A with wheel B
- Look at the duplicate to see if it needs shared dependencies to be copied
- If no, and it's just that some new tags can be added, then keep A and therefore the original hash with it, just rename it and therefore you get to keep the hash. Otherwise, repair the wheel into wheel C modifying the hash, and discarding A and B.
We should be able to do this on our own with a shell script or two, but I feel that ensuring reproducibility is important in general – if these wheel repair tools can offer this use case upstream, it would be highly beneficial. I'll put in a feature request there.
We should be able to do this on our own with a shell script or two,
This repo is supposed to work on Windows (not Cygwin or MinGW), so I don't think a shell script cuts it.
I'm also not completely sold on the approach of "repairing" the wheel, vs. producing a correct one in first place.
This repo is supposed to work on Windows (not Cygwin or MinGW), so I don't think a shell script cuts it.
Ah, apologies if I was terse. With this, I meant having a PowerShell script as well, since it's now the default shell on Windows. A general Python script would be a better cross-platform alternative, in any case.
I'm also not completely sold on the approach of "repairing" the wheel, vs. producing a correct one in first place.
I agree, this is just one of the solutions. I think that out of all of the solutions, it's still better to dry-run repair, and adjust platform tags manually, at least not until this is added as a feature. Therefore, I can offer no better than that at this time.
Although, I wonder if you could version the script with tags for this repository. That would ensure a higher level of reproducibility. For example, changes like #21 have bumped the macOS minimum version, and if someone now wants to build a bundle of macOS wheels for older Zig versions, say, 0.7.0, they'll receive this new constraint too – but Zig 0.7.0 might be compatible with older versions of macOS, too say, 10.15, (I haven't checked though) and having 11.7 here renders the wheel to be incorrectly tagged). Asking users to check out a particular tag is better than asking them to check out an older commit. That way, you can specify lower and upper bounds for Zig wheels for a particular version of the script, i.e., highlighting what they can build wheels for – the layperson's time machine.
This might be too much to maintain, though, so I understand if you don't wish to do this and would prefer responding to users individually if someone runs into this problem and then chooses to report it.
The problem with tagging is that e.g. I backfilled the 32-bit wheels for 0.11.0 when you asked for it...
Ah, well, as long as there are not a lot of users for the repository (and if there won't be many), tags are not needed and it's possible to edit the script and publish the wheels manually :) (and many thanks for uploading those!)
Yeah, it's kind of an error-prone process in first place.
One thing that can be done with CI is OIDC based "trusted publishing". I think PyPI did not have it originally when I came up wit this project or I'd have used it. In that case, there is a transparency record published in a log, referring to the git repo, CI workflow, CI workflow logs, etc every time a package is published.
Yeah, it's kind of an error-prone process in first place.
One thing that can be done with CI is OIDC based "trusted publishing". I think PyPI did not have it originally when I came up wit this project or I'd have used it. In that case, there is a transparency record published in a log, referring to the git repo, CI workflow, CI workflow logs, etc every time a package is published.
That works, too. Additionally, please consider signing the binaries with Sigstore, that way they are verifiable (if you publish the relevant signature files to GitHub Releases). I would be happy to set CI up for you for both of these things.
I think we can start with a workflow responding to workflow_dispatch event where the workflow itself uses both Sigstore and Trusted Publishing, with the pushing of wheels still being manual. Then we can consider something more automated after that.
I agree that reproducibility is probably not enough and I'd really rather not be on a critical path to accurate delivery of opaque binaries.
Thank you for your comments. I agree workflow_dispatch event makes sense – I shall be on to this in a couple of days.
Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use repairwheel or delocate.
(Hi - author of repairwheel here)
Are you seeing this when repairing a single wheel multiple times (i.e., repairing an already-repaired wheel), or are you seeing different output when repairing the same original wheel multiple times?
If it's the latter I'd definitely be interested in more details. repairwheel takes steps to ensure the reproducibility of outputs by sorting zip entries and clearing timestamps. The repos' own CI tests ensure that a repaired wheel's checksum is equivalent whether the repair operation is run on linux, windows, or macos.
Hi, @jvolkman! Thanks for chiming in – I just tried to build the wheel here again with python make_wheels.py --platform x86_64-linux – just an experiment with a single wheel:
c7ae866b8a76a568e2d5cfd31fe89cdb629bdd161fdd5018b29a4a0a17045cad https://ziglang.org/download/0.12.0/zig-linux-x86_64-0.12.0.tar.xz
0c777f1d5d9be32d0edffc79956d03ed35f868a8db7b0552da0984182aa10ff0 dist/ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl
I then copied this wheel to a particular directory, repaired it with repairwheel into a different one.
First attempt
pipx run repairwheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl -o wheelhouse
Wheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl has multiple tags; using first (py3-none-musllinux_1_1_x86_64)
ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl
is consistent with the following platform tag: "linux_1_1_x86_64".
The wheel references no external versioned symbols from
system-provided shared libraries.
The wheel requires no external shared libraries! :)
Wrote /Users/agriyakhetarpal/Desktop/zig-pypi/wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
The SHA-256 checksum of the wheel:
shasum -a 256 wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
0db5afefea085ea235e4537d3d1c870e9c1071a53f484445e0bd528c03b5f70d wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
and I repaired the same wheel, again, this time into a different directory (not repairing an already repaired wheel):
Second time
pipx run repairwheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl -o wheelhouse1
Wheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl has multiple tags; using first (py3-none-musllinux_1_1_x86_64)
ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl
is consistent with the following platform tag: "linux_1_1_x86_64".
The wheel references no external versioned symbols from
system-provided shared libraries.
The wheel requires no external shared libraries! :)
Wrote /Users/agriyakhetarpal/Desktop/zig-pypi/wheelhouse1/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
and the checksum is indeed the same:
0db5afefea085ea235e4537d3d1c870e9c1071a53f484445e0bd528c03b5f70d wheelhouse1/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
So, yes, you're right – I was indeed repairing an already-repaired wheel multiple times, and repairing the same unrepaired wheel is deterministic, which is good – resolves many of the problems here.
However, as a by-product, the issue that I noticed is about the wheel is receiving a linux_1_1_x86_64 tag upon repair – this does not seem to be valid? It does not have the musl- prefix to conform to MUSL 1.1 (which is already present in the unrepaired wheel). A similar bug with an unwanted tag can be noticed when I build the aarch64 Linux wheels and then proceed to repair them, it adds the linux_1_1_aarch64 tag. I was going to file a bug report this week about this, good to see that you came to the pedestal yourself, haha!
Same thing with the armv7l wheels – it adds the linux_2_17_armv7l tag upon repair. I am happy to file a detailed bug report on repairwheel's issue tracker with more details, at least, for organisational purposes in case someone else stumbles into this – and contribute as well, if this is something that can be fixed without changing any vendored libraries that repairwheel could be using internally.
Got it. Good to hear about the reproducible repair.
I'd guess the other issue is just that repairing musl wheels isn't currently supported, and the failure manifests as that ancient linux tag being added. The problem is that auditwheel determines that a wheel is musl and what its musl version is by looking at /lib/libc.musl* (here), which doesn't exist when repairing cross platform. And so a glibc repair is currently assumed.
I don't have a good solution, but it would be nice to support musl wheels somehow. Feel free to open an issue to track that if you'd like.