LLVM (clang-cl, Intel Nextgen) builds crash on pre-AVX CPUs
pinterf opened this issue · 3 comments
As the title says, any attempt to load such Avisynth+ builds from VirtualDub or, avsmeter dies prematurely.
thanks for @jpsdr for the report, and for @DTL2020 who pointed out on the exact case: "illegal instruction"
Error occured only in release builds.
Problem's reason summary: Such a construct (see below), when appears in an AVX2 source module, will get static initialized before anything can happen in AviSynth. This init routine will get called even on processors having no AVX/AVX2 instruction set.
Compiler must initialize their statically declared data (this time it was an array initialization, a table for dithering - e.g. a memory copy from a constant table to fill the class' data_sse2 array)
In convert_bit.h (which was included in convert_bits_avx2.cpp):
static const struct dither2x2a_t
{
const BYTE data[4] = {
0, 1,
1, 0,
};
// cycle: 2
alignas(16) const BYTE data_sse2[2 * 16] = {
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
};
dither2x2a_t() {};
} dither2x2a;
The illegal instruction came from
dither2x2a_t() {}
which triggered the static initialization. And since
- static initialization must occur (automatically) after loading the DLL
- this initialization was in an avx2 module
- LLVM compiler optimized the copy using v-prefixed intruction set and 32 bit ymm registers
So it crashed on an SSE4 computer.
After installing Visual Studio on an i7-860 (pre-AVX) machine, ReleaseWithDebug build showed the crash disassembly:
AviSynth.dll!_GLOBAL__sub_I_convert_bits_avx2.cpp(void):
00007FFBF987F4E0 mov dword ptr [dither2x2a (07FFBF9B03620h)],10100h
00007FFBF987F4EA vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+10h (07FFBF9A3CF60h)]
00007FFBF987F4F2 vmovups ymmword ptr [dither2x2a+10h (07FFBF9B03630h)],ymm0
00007FFBF987F4FA mov dword ptr [dither2x2 (07FFBF9B03650h)],1030200h
00007FFBF987F504 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+30h (07FFBF9A3CF80h)]
00007FFBF987F50C vmovups ymmword ptr [dither2x2+10h (07FFBF9B03660h)],ymm0
00007FFBF987F514 vmovaps xmm0,xmmword ptr [__xmm@02060307040005010307020605010400 (07FFBF991D470h)]
00007FFBF987F51C vmovaps xmmword ptr [dither4x4a (07FFBF9B03680h)],xmm0
00007FFBF987F524 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+50h (07FFBF9A3CFA0h)]
00007FFBF987F52C vmovups ymmword ptr [dither4x4a+10h (07FFBF9B03690h)],ymm0
00007FFBF987F534 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+70h (07FFBF9A3CFC0h)]
etc.
This helped to locate the exact position of the problem.
So we have to take care not using static class inititialing in the AVX2 (in general: other than the minimal CPU arch) compiled source.
Such predefined tables are now using extern const
in header, and extern const
+ definition once, in the base-CPU-arch compiled module: convert_bits.cpp. convert_bits_sse.cpp and convert_bits_avx2.cpp are using them without any duplication and initialization need.
When I generated an asm list output, I could see any more one_only,_GLOBAL__sub_I_
+ fn_name occurence in *_avx2.asm files.
I reopen it, since this is a blind fix, until I get feedback or I can try it after I get to my ancient i7-860 PC again in some days.
Feedback received.
https://forum.doom9.org/showthread.php?p=1984827#post1984827
Closing it. I learned a lot.