ashvardanian/StringZilla

sz_capabilities might be incorrect for AVX512

Closed this issue ยท 4 comments

The sz_capabilities function seems to return incorrect values for certain AVX512 instruction sets.

lscpu gives:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       36 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz
Stepping:            3
CPU MHz:             2496.000
CPU max MHz:         2496.0000
BogoMIPS:            4992.00
Hypervisor vendor:   Windows Subsystem for Linux
Virtualization type: container
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
                        sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 est tm2 ssse3 fma cx16 xtpr pdcm
                        pcid sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor lahf_lm abm
                        3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap
                        clflushopt ibrs ibpb stibp ssbd

But the python stringzilla.__capabilities__ gives:

'serial,avx2,avx512vbmi,gfni,'

If I understand cpuid correctly, info1 refers to calling cpuid with eax==1 (From https://en.wikipedia.org/wiki/CPUID).
Then the following code in sz_capabilites is incorrect.

StringZilla/c/lib.c

Lines 78 to 83 in 1a4b05d

// https://github.com/llvm/llvm-project/blob/50598f0ff44f3a4e75706f8c53f3380fe7faa896/clang/lib/Headers/cpuid.h#L171C30-L171C40
unsigned supports_avx512vbmi = (info1.named.ecx & 0x00000002) != 0;
unsigned supports_avx512vbmi2 = (info1.named.ecx & 0x00000040) != 0;
// Check for GFNI (Function ID 1, ECX register)
// https://github.com/llvm/llvm-project/blob/50598f0ff44f3a4e75706f8c53f3380fe7faa896/clang/lib/Headers/cpuid.h#L177C30-L177C40
unsigned supports_gfni = (info1.named.ecx & 0x00000100) != 0;

So the info1s should instead be info7s.
As currently:

  • supports_avx512vbmi: Actually represents pclmulqdq (Supported).
  • supports_avx512vbmi2: Actually represents smx (Not Supported).
  • supports_gfni: Actually represents tm2 (Supported).

Yes, @ashbob999! You may be right. If you have a patch, will be happy to test on my end as well ๐Ÿค—

I haven't made a patch yet, but it should be simply enough to do.

Just replace the 3 info1s with info7s.

I can create a PR with the patch, and test it myself, but it won't be till the weekend.

Yes, that fixes the issue! It would be great if we had similar CPU-capability checks for Arm.

๐ŸŽ‰ This issue has been resolved in version 3.8.2 ๐ŸŽ‰

The release is available on GitHub release

Your semantic-release bot ๐Ÿ“ฆ๐Ÿš€