PyTorch MacOS x86 fail: section __TEXT/__const address out of range for architecture x86_64 when building NNPACK
kulinseth opened this issue · 13 comments
The PyTorch MacOS build with NNPack is failing with: section __TEXT/__const address out of range for architecture x86_64
When upgrading the Xcode to latest 13.3.1, we see this behavior.
The difference between Xcode 13.2.1 and 13.3 is that there are more boundary checks to prevent OOB reads.
The conv1x1.yp.o object file has malformed load commands:
$ size -mlx conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
total 0x36f
total 0x36f
__const section starts at 0x300, and ends at 0x380, which exceeds the __TEXT segment size (0x36f).
There is manually generated object file using the third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py script. Can we regenerate the object file with the latest Xcode to make sure this bug is fixed and there is no OOB access.
I'm trying to build PeachPy but I am getting:
src_dir = os.path.abspath(self.distribution.package_dir[""])
KeyError: ''
It's working now ... ;-)
I ran
$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py
clang
% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Is this correct?
% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288)
Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
total 0x36f
total 0x380
(base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
-rw-r--r-- 1 davidlaxer staff 1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Trying to build PyTorch next.
It's working now ... ;-) I ran
$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py
clang
% clang --version Apple clang version 13.1.6 (clang-1316.0.21.2.5) Target: x86_64-apple-darwin21.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Is this correct?
% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288) Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288) Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088) total 0x36f total 0x380 (base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o -rw-r--r-- 1 davidlaxer staff 1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Trying to build PyTorch next.
Thanks @dbl001 for taking a look. Make sure the command line is same as what is passed to clang when building PyTorch. You can get that in verbose mode.
@Maratyszcza do you mind merging latest master into pre-generated
branch? Or should I just fork it and maintain it myself for PyTorch?
@malfet Create a pull request, and I'll merge
My PyTorch build failed again. What's -g4?
cd /Users/davidlaxer/pytorch/build/confu-deps/NNPACK && PYTHONPATH=/Users/davidlaxer/pytorch/third_party/python-six:/Users/davidlaxer/pytorch/third_party/python-peachpy
Users/davidlaxer/anaconda3/bin/python -m peachpy.x86_64 -mabi=sysv -g4 -mimage-format=mach-o -I/Users/davidlaxer/pytorch/third_party/NNPACK/src -I/Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma -I/Users/davidlaxer/pytorch/third_party/FP16/include -o /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o /Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py
...
ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
...
% size -mlx /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
total 0x36f
total 0x36f
find . -name 'peachpy*' -ls
230205417 0 drwxr-xr-x 21 davidlaxer staff 672 May 20 07:37 ./third_party/python-peachpy/peachpy
230205416 40 -rw-r--r-- 1 davidlaxer staff 19560 May 19 10:13 ./third_party/python-peachpy/logo/peachpy.png
230205498 8 -rw-r--r-- 1 davidlaxer staff 864 May 19 10:13 ./third_party/python-peachpy/sphinx/peachpy.rst
230166126 0 drwxr-xr-x 5 davidlaxer staff 160 May 19 10:12 ./third_party/FP16/test/peachpy
@dbl001 have you updated the submodules? I've landed the change like an hour ago, that should have fixed that
Is it still pinned to the old version in PyTorch?
E.g. #76094
Is it still pinned to the old version in PyTorch? E.g. #76094
I'm not sure I understand the question
I think this has been addressed.