KumaTea/pytorch-aarch64

Illegal instruction (core dumped) on Raspberry Pi 4B

unwind opened this issue · 11 comments

When running with the latest (1.9.0) wheel from here, as per the installation instructions, my project's Torch code crashes every time with an illegal instruction exception.

The top 10 stack levels looked like this:

(gdb) where
#0  0x0000ffffd286dfc8 in exec_blas ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#1  0x0000ffffd283f150 in gemm_driver ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#2  0x0000ffffd283fbd0 in sgemm_thread_nn ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#3  0x0000ffffd28385bc in sgemm_ () from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#4  0x0000ffffcfc38b8c in at::native::cpublas::gemm(at::native::cpublas::TransposeType, at::native::cpublas::TransposeType, long, lo
ng, long, float, float const*, long, float const*, long, float, float*, long) ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#5  0x0000ffffcfce5c48 in at::native::addmm_impl_cpu_(at::Tensor&, at::Tensor const&, at::Tensor, at::Tensor, c10::Scalar const&, c1
0::Scalar const&) () from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#6  0x0000ffffcfce68d0 in at::native::mm_cpu_out(at::Tensor const&, at::Tensor const&, at::Tensor&) ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#7  0x0000ffffcfce6a34 in at::native::mm_cpu(at::Tensor const&, at::Tensor const&) ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#8  0x0000ffffd056c784 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFuncti
onPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_mm>, at::Ten
sor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call
(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#9  0x0000ffffd039b464 in at::redispatch::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) ()
   from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#10 0x0000ffffd1b5659c in torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tenso
r const&) () from lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

Looking at the disassembly at the indicated location I got:

(gdb) disassemble
Dump of assembler code for function exec_blas:
   0x0000ffffd286df70 <+0>:     adrp    x2, 0xffffd40c6000
   0x0000ffffd286df74 <+4>:     stp     x29, x30, [sp, #-80]!
   0x0000ffffd286df78 <+8>:     mov     x29, sp
   0x0000ffffd286df7c <+12>:    ldr     x3, [x2, #2376]
   0x0000ffffd286df80 <+16>:    mov     x2, x0
   0x0000ffffd286df84 <+20>:    stp     x19, x20, [sp, #16]
   0x0000ffffd286df88 <+24>:    mov     x20, x1
   0x0000ffffd286df8c <+28>:    ldr     w0, [x3]
   0x0000ffffd286df90 <+32>:    cbz     w0, 0xffffd286e000 <exec_blas+144>
   0x0000ffffd286df94 <+36>:    cmp     x2, #0x0
   0x0000ffffd286df98 <+40>:    ccmp    x20, #0x0, #0x4, gt
   0x0000ffffd286df9c <+44>:    b.eq    0xffffd286dff0 <exec_blas+128>  // b.none
   0x0000ffffd286dfa0 <+48>:    adrp    x19, 0xffffd4150000 <memory+1984>
   0x0000ffffd286dfa4 <+52>:    add     x1, sp, #0x38
   0x0000ffffd286dfa8 <+56>:    add     x4, x19, #0x4f0
   0x0000ffffd286dfac <+60>:    mov     w0, #0x1                        // #1
   0x0000ffffd286dfb0 <+64>:    add     x4, x4, #0x40
   0x0000ffffd286dfb4 <+68>:    nop
   0x0000ffffd286dfb8 <+72>:    nop
   0x0000ffffd286dfbc <+76>:    nop
   0x0000ffffd286dfc0 <+80>:    strb    wzr, [sp, #56]
   0x0000ffffd286dfc4 <+84>:    mov     w3, #0x0                        // #0
=> 0x0000ffffd286dfc8 <+88>:    casalb  w3, w0, [x4]
   0x0000ffffd286dfcc <+92>:    cbnz    w3, 0xffffd286dfc0 <exec_blas+80>
   0x0000ffffd286dfd0 <+96>:    adrp    x0, 0xffffd286d000 <inner_thread+2192>
   0x0000ffffd286dfd4 <+100>:   stp     x2, x20, [sp, #56]
   0x0000ffffd286dfd8 <+104>:   add     x0, x0, #0xbcc
   0x0000ffffd286dfdc <+108>:   str     xzr, [sp, #72]

This seems to indicate the the culprit is the CASALB instruciton, which as far as I can understand is ARM8.1, while the Raspberry Pi has an ARM8-compliant core.

I hope this can be fixed, since building Torch myself seems daunting (and also since, assuming I'm right above, this is not the intended behavior).

Thanks for making tihs avaialble.

Hi.

Currently PyTorch wheels for Python 3.6 - 3.9 are installed from the official PyPI source.
It's very likely that the PyTorch team used some enterprise cloud VM with ARM CPUs, which are ARMv8.2 based, so that wheels won't disable v8.1 instructions.

Do you have any sample code?
I don;t know much of C, but I'll try to compile from source if the problem reproduces, in a few days (my borad is not available this week).

Thanks!

Hi.

Thanks for the rapid response. Not sure if I have code I can share, perhaps I can stitch together something but it will be a while. This is my last week before going on vacation, and there's other things to do in the project.

Thanks!

Hi again.

Okay, here's an attempt at a reproduction case:

#!/bin/usr/env python3
import torch
w_bn = torch.randn(64,64)
w_conv = torch.randn(64,108)
w = torch.randn(64, 12, 3, 3)
w.copy_(torch.mm(w_bn, w_conv).view(w.size()))

This crashes with a core dump every time I run it. Apologies for the random-seeming dimensions, it's just what our project seemed to be using (I'm not the author if the PyTorch-using code in our project, so I lack deeper understanding).

I did not trace this down to the core dump, but I would say chances are pretty good this is the same crash. I now understand that the "mm" in the trace above refers to a matrix multiplication, and this line of code (which is the same as in our project, except of course the data has been replaced with random matrices) calls mm() and never returns.

Good luck!

Hi, thank you for your replies!

I just tried your sample code on Python 3.8 (official wheel) and Python 3.10 (wheel from this repo). The results are:

On Python 3.8, bash reported Illegal instruction (core dumped) and exited,
and fish reported fish: Job 1, “python3” terminated by signal SIGILL (Illegal instruction) then exited.

On Python 3.10, it printed

tensor([[[[ 1.0143e+01, -1.2882e+01, -1.1660e+00],
          [ 1.1609e+01,  7.4942e+00,  3.3680e-01],
          [ 8.7291e+00,  1.7029e+01, -1.6758e+01]],

         ...

successfully.

I don't really understand what the codes mean, but I think this might suggest the assumption above.


I'll build wheels for 3.6 - 3.9 asap. Thank you again!

Okay great, feel free to drop me a line when you have wheels available and hopefully I can test, too.

Thanks!

Hi,
I have the same problem on my Raspberry Pi 4B with python 3.8.10.
Sadly after installing the updated wheel, the situation is the same.

Hi,
I have the same problem on my Raspberry Pi 4B with python 3.8.10.
Sadly after installing the updated wheel, the situation is the same.

Hi, could you provide your error report(s) and sample code?

I've tried on the wheel with the code above, but it worked normally. Could it be some difference in your code that cause the problem?

Thanks!

hmm. I tested the same code and it is the problematic casalb instruction. Official torch version 1.8.1 is working.

Oh, I see. After upgrading again it is working. Maybe not uninstalling official 1.9.0 before installing this version was the problem.
Thanks!

Hi!

It does seem to resolve the issue for me on my Raspberry target. I had to (as you say) download your wheel manually and pip install it directly from the file, but that was expected and worked well.

Thanks!

Hi, I have the same issue here. But it happen when I try to feed an image to my model. I'm using one of the existing models in pytorch.

I tried to download the wheel manually and install it, I got an error indicating that.
torch-1.9.0-cp310-cp310-linux_aarch64.whl is not a supported wheel on this platform. I tired with cp36 linux and cp36 many linux but all are the same thing.

I have tried multiple models one of them is as follows
net = models.quantization.mobilenet_v2(pretrained=True)

@unwind perhaps you can let me know which wheel file did u try. Thanks

I'm using python 3.9.2