dubhater/vapoursynth-nnedi3

FMA4 used even if unsupported

olifre opened this issue · 6 comments

With current git HEAD, FMA4 is used (unconditionally?) and I get, on my not-so-very-old Haswell machine:
Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffea7b64700 (LWP 17272)]
0x00007ffea8423378 in nnedi3_e0_m16_FMA4 () from /usr/lib64/libnnedi3.so

If you need further info (/proc/cpuinfo etc.) just ask.

Please paste the output of cpuid --one-cpu --raw (http://www.etallen.com/cpuid.html). The only thing I can think of is that I'm doing the CPU feature detection slightly wrong. I need that output to verify.

Here you go:

CPU:
   0x00000000 0x00: eax=0x0000000d ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000306c3 ebx=0x07100800 ecx=0x7ffafbff edx=0xbfebfbff
   0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000 edx=0x00c10000
   0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x02: eax=0x1c004143 ebx=0x01c0003f ecx=0x000001ff edx=0x00000000
   0x00000004 0x03: eax=0x1c03c163 ebx=0x03c0003f ecx=0x00001fff edx=0x00000006
   0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x00042120
   0x00000006 0x00: eax=0x00000077 ebx=0x00000002 ecx=0x00000009 edx=0x00000000
   0x00000007 0x00: eax=0x00000000 ebx=0x000027ab ecx=0x00000000 edx=0x00000000
   0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000a 0x00: eax=0x07300403 ebx=0x00000000 ecx=0x00000000 edx=0x00000603
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000007
   0x0000000b 0x01: eax=0x00000004 ebx=0x00000008 ecx=0x00000201 edx=0x00000007
   0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000
   0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x3e: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000021 edx=0x2c100800
   0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320 edx=0x4d542865
   0x80000003 0x00: eax=0x37692029 ebx=0x3139342d ecx=0x20514d30 edx=0x20555043
   0x80000004 0x00: eax=0x2e322040 ebx=0x48473039 ecx=0x0000007a edx=0x00000000
   0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040 edx=0x00000000
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
   0x80000008 0x00: eax=0x00003027 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80860000 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000
   0xc0000000 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000

It's a Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz, by the way (which turbo's up to 4.1 GHz).

Okay. The FMA4 bit is not set, so that's not it. I suppose you'll just have to use a debugger to find out what happens, because I don't see anything wrong in the code. Do you need detailed instructions for using gdb?

Well, this is extremely strange... I tried it just now again, on the same machine, with the same video, same vapoursynth-script, inside mpv, and do not see the SIGILL from nnedi3 running FMA4 anymore.

Microcode did not / should not have changed. I did not recompile, neither vapoursynth nor the nnedi3 plugin.

With valgrind, I also see only tons of python3-invalid-reads (probably some false positives which need a suppression file, I am not that experienced with python itself) and nothing from the nnedi3 library.

I am out of explanations, but this was reproducible just yesterday on the same machine...

Let's assume the problem fixed itself. :)

You're right, let's assume that for now - if it ever pops back up again, I'll for sure gdb it directly ;-).