cathugger/mkp224o

Segfault on Windows with amd64-51-30k or amd64-64-24k

Closed this issue · 4 comments

On both d5b90d4 and 7f714ee, I get a segfault with either the amd64-51-30k or the amd64-64-24k implementations. ref10, donna and donna-sse2 all work fine.

Environment:

Msys2 (either clang64 or gcc64) MINGW64_NT-10.0-19044
gcc version 11.3.0 (Rev1, Built by MSYS2 project)
clang version 14.0.3
libsodium 1.0.18-2
CC="clang"
CFLAGS="-march=native -Og -g -pipe -fomit-frame-pointer"
MAKEFLAGS="-j$(nproc)"

Build steps:

make clean
./autogen.sh
./configure --enable-regex --with-pcre2="/clang64/bin/pcre2-config" --enable-[amd64-51-30k|amd64-64-24k] --enable-intfilter [--enable-binsearch --enable-besort]
make

Result:

gdb --args ./mkp224o.exe -B -s -T test

(gdb) run
Starting program: C:\msys64\home\Adam\mkp224o\mkp224o.exe -B -s -T test
[New Thread 53912.0x121ac]
[New Thread 53912.0x16d5c]
[New Thread 53912.0x14cf8]
filters:
        test
in total, 1 filter
using 8 threads
[New Thread 53912.0xda48]
[New Thread 53912.0x15ab0]
[New Thread 53912.0x132fc]
[New Thread 53912.0xb548]
[New Thread 53912.0xb45c]
[New Thread 53912.0xcad0]
[New Thread 53912.0x10c2c]
[New Thread 53912.0x10190]

Thread 5 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 53912.0xda48]
0x00007ff73fb790b6 in crypto_sign_ed25519_amd64_51_30k_batch_choose_t ()

Yet to try with wsl2.

Trying with WSL2:
Cross compiling still compiles fine like Msys2, but still segfaults with those two implementations.
Compiling natively within WSL2 compiles and runs fine (and faster than the implementations that don't segfault).

Still, native Windows would be preferable to avoid the slight CPU overhead and moderate RAM overhead of WSL2.

I think for windows, amd64-51-30k|amd64-64-24k never worked correctly & donna makes more sense. probably something to do with call ABI differences or something in these lines. that's why I've always built windows stuff with donna.

i think ive just fixed it in 4e20f08

Yep. Now it runs the same speed as under WSL (~2x donna) without the ~50x RAM usage!

For any Windows users, until #38 is implemented, on an i7-2600k the best performance I get comes from:
-march=native -O3 -fno-plt -flto -fomit-frame-pointer
./configure --enable-regex --with-pcre2="[mingw64|clang64]/bin/pcre2-config" --enable-amd64-64-24k --enable-intfilter --enable-binsearch --enable-besort

I see no appreciable performance difference between GCC and Clang.