Maratyszcza/PeachPy

PyTorch MacOS x86 fail: section __TEXT/__const address out of range for architecture x86_64 when building NNPACK

kulinseth opened this issue · 13 comments

The PyTorch MacOS build with NNPack is failing with: section __TEXT/__const address out of range for architecture x86_64

When upgrading the Xcode to latest 13.3.1, we see this behavior.
The difference between Xcode 13.2.1 and 13.3 is that there are more boundary checks to prevent OOB reads.

The conv1x1.yp.o object file has malformed load commands:
$ size -mlx conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
total 0x36f
total 0x36f
__const section starts at 0x300, and ends at 0x380, which exceeds the __TEXT segment size (0x36f).
There is manually generated object file using the third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py script. Can we regenerate the object file with the latest Xcode to make sure this bug is fixed and there is no OOB access.

I'm trying to build PeachPy but I am getting:

src_dir = os.path.abspath(self.distribution.package_dir[""])
KeyError: ''

#118

It's working now ... ;-)
I ran

$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py

clang

% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Is this correct?

% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x380
(base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
-rw-r--r--  1 davidlaxer  staff  1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o

Trying to build PyTorch next.

It's working now ... ;-) I ran

$ python -m peachpy.x86_64 -mabi=sysv -mimage-format=mach-o -o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py

clang

% clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Is this correct?

% size -mlx ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
Segment __TEXT: 0x380 (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x380
(base) davidlaxer@x86_64-apple-darwin13 pytorch % ls -l ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o
-rw-r--r--  1 davidlaxer  staff  1427 May 25 07:09 ./third_party/NNPACK/src/x86_64-fma/blas/conv1x1.o

Trying to build PyTorch next.

Thanks @dbl001 for taking a look. Make sure the command line is same as what is passed to clang when building PyTorch. You can get that in verbose mode.

I wonder if this was already fixed by f8ef1a3

@Maratyszcza do you mind merging latest master into pre-generated branch? Or should I just fork it and maintain it myself for PyTorch?

@malfet Create a pull request, and I'll merge

My PyTorch build failed again. What's -g4?

cd /Users/davidlaxer/pytorch/build/confu-deps/NNPACK && PYTHONPATH=/Users/davidlaxer/pytorch/third_party/python-six:/Users/davidlaxer/pytorch/third_party/python-peachpy

Users/davidlaxer/anaconda3/bin/python -m peachpy.x86_64 -mabi=sysv -g4 -mimage-format=mach-o -I/Users/davidlaxer/pytorch/third_party/NNPACK/src -I/Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma -I/Users/davidlaxer/pytorch/third_party/FP16/include -o /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o /Users/davidlaxer/pytorch/third_party/NNPACK/src/x86_64-fma/blas/conv1x1.py
...
ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
...
 % size -mlx  /Users/davidlaxer/pytorch/build/confu-deps/NNPACK/src/x86_64-fma/blas/conv1x1.py.o
Segment __TEXT: 0x36f (vmaddr 0x0 fileoff 288)
	Section (__TEXT, __text): 0x2ef (addr 0x0 offset 288)
	Section (__TEXT, __const): 0x80 (addr 0x300 offset 1088)
	total 0x36f
total 0x36f

find . -name 'peachpy*' -ls
230205417        0 drwxr-xr-x   21 davidlaxer       staff                 672 May 20 07:37 ./third_party/python-peachpy/peachpy
230205416       40 -rw-r--r--    1 davidlaxer       staff               19560 May 19 10:13 ./third_party/python-peachpy/logo/peachpy.png
230205498        8 -rw-r--r--    1 davidlaxer       staff                 864 May 19 10:13 ./third_party/python-peachpy/sphinx/peachpy.rst
230166126        0 drwxr-xr-x    5 davidlaxer       staff                 160 May 19 10:12 ./third_party/FP16/test/peachpy

@dbl001 have you updated the submodules? I've landed the change like an hour ago, that should have fixed that

@malfet Create a pull request, and I'll merge

Here it is #120
I'm not really sure if PyTorch build system still relies on pre-generated branch (or why this script can not be run during the build process, but whatever)

Is it still pinned to the old version in PyTorch?
E.g. #76094

Is it still pinned to the old version in PyTorch? E.g. #76094

I'm not sure I understand the question

I think this has been addressed.