AviSynth/AviSynthPlus

LLVM (clang-cl, Intel Nextgen) builds crash on pre-AVX CPUs

pinterf opened this issue · 3 comments

As the title says, any attempt to load such Avisynth+ builds from VirtualDub or, avsmeter dies prematurely.

thanks for @jpsdr for the report, and for @DTL2020 who pointed out on the exact case: "illegal instruction"

Error occured only in release builds.

Problem's reason summary: Such a construct (see below), when appears in an AVX2 source module, will get static initialized before anything can happen in AviSynth. This init routine will get called even on processors having no AVX/AVX2 instruction set.
Compiler must initialize their statically declared data (this time it was an array initialization, a table for dithering - e.g. a memory copy from a constant table to fill the class' data_sse2 array)

In convert_bit.h (which was included in convert_bits_avx2.cpp):

static const struct dither2x2a_t
{
  const BYTE data[4] = {
    0, 1,
    1, 0,
  };
  // cycle: 2
  alignas(16) const BYTE data_sse2[2 * 16] = {
    0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
    1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
  };
  dither2x2a_t() {};
} dither2x2a;

The illegal instruction came from
dither2x2a_t() {}

which triggered the static initialization. And since

  • static initialization must occur (automatically) after loading the DLL
  • this initialization was in an avx2 module
  • LLVM compiler optimized the copy using v-prefixed intruction set and 32 bit ymm registers

So it crashed on an SSE4 computer.

After installing Visual Studio on an i7-860 (pre-AVX) machine, ReleaseWithDebug build showed the crash disassembly:

AviSynth.dll!_GLOBAL__sub_I_convert_bits_avx2.cpp(void):
00007FFBF987F4E0  mov         dword ptr [dither2x2a (07FFBF9B03620h)],10100h  
00007FFBF987F4EA  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+10h (07FFBF9A3CF60h)]  
00007FFBF987F4F2  vmovups     ymmword ptr [dither2x2a+10h (07FFBF9B03630h)],ymm0  
00007FFBF987F4FA  mov         dword ptr [dither2x2 (07FFBF9B03650h)],1030200h  
00007FFBF987F504  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+30h (07FFBF9A3CF80h)]  
00007FFBF987F50C  vmovups     ymmword ptr [dither2x2+10h (07FFBF9B03660h)],ymm0  
00007FFBF987F514  vmovaps     xmm0,xmmword ptr [__xmm@02060307040005010307020605010400 (07FFBF991D470h)]  
00007FFBF987F51C  vmovaps     xmmword ptr [dither4x4a (07FFBF9B03680h)],xmm0  
00007FFBF987F524  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+50h (07FFBF9A3CFA0h)]  
00007FFBF987F52C  vmovups     ymmword ptr [dither4x4a+10h (07FFBF9B03690h)],ymm0  
00007FFBF987F534  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+70h (07FFBF9A3CFC0h)]  
etc.

This helped to locate the exact position of the problem.

So we have to take care not using static class inititialing in the AVX2 (in general: other than the minimal CPU arch) compiled source.

Such predefined tables are now using extern const in header, and extern const + definition once, in the base-CPU-arch compiled module: convert_bits.cpp. convert_bits_sse.cpp and convert_bits_avx2.cpp are using them without any duplication and initialization need.

When I generated an asm list output, I could see any more one_only,_GLOBAL__sub_I_ + fn_name occurence in *_avx2.asm files.

I reopen it, since this is a blind fix, until I get feedback or I can try it after I get to my ancient i7-860 PC again in some days.

Feedback received.
https://forum.doom9.org/showthread.php?p=1984827#post1984827
Closing it. I learned a lot.