SSMFE C interface segfaults on Windows
jfowkes opened this issue · 15 comments
Moving over to meson has enabled us to test on Windows and this has exposed a segfault in the SSMFE C interface:
test: ssmfet_c
start time: 14:05:58
duration: 0.05s
result: (exit status 3221225725 or signal 3221225597 SIGinvalid)
Note entirely sure what these strangely large exit statuses mean. @amontoison?
@jfowkes It tried to find something with a highest warning level in #159 but I found nothing :(
Maybe you could try ralna/GALAHAD#108?
@amontoison good shout, will see if I can run some sanitisers...
@amontoison I'm getting:
FAILED: libspral.dll
"gfortran" @libspral.dll.rsp
c:/programdata/chocolatey/lib/mingw/tools/install/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
cannot find -lasan: No such file or directory
c:/programdata/chocolatey/lib/mingw/tools/install/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
cannot find -lubsan: No such file or directory
I guess the sanitizers don't work on windows?
I checked online and the sanitizers are not working with GCC on Windows.
https://stackoverflow.com/questions/55018627/cannot-find-lasan-using-address-sanitizer-in-mingw-in-windows-mingw
Maybe we should split ssmfet_c
into smaller unit tests to isolate the issue?
Good plan! I will have a go next week at splitting up the ssmfet_c
tests to try and isolate the issue.
@amontoison here is the C main function for the SSMFE test:
int main(void) {
int errors = 0;
int err;
fprintf(stdout, "testing ssmfe_core...\n");
err = test_core();
errors += err;
fprintf(stdout, "%d errors\n", err);
fprintf(stdout, "testing ssmfe_expert...\n");
err = test_expert();
errors += err;
fprintf(stdout, "%d errors\n", err);
fprintf(stdout, "testing ssmfe...\n");
err = test_ssmfe();
errors += err;
fprintf(stdout, "%d errors\n", err);
fprintf(stdout, "=============================\n");
fprintf(stdout, "Total number of errors = %d\n", errors);
return errors;
}
Why are we not seeing the first print line (testing ssmfe_core...
) being printed in the logs on Windows? Is this because the test errors out before even getting to this line?
Can you comment the first test with test_core()
?
I suspect that test_core()
failed and the value err
is never defined inside this function.
Indeed that appears to be the case, I've just flushed the print statements in main and I get:
----------------------------------- stdout -----------------------------------
testing ssmfe_core...
before it crashes. I will add some more flushes to test_core
to try isolate the issue.
@amontoison okay I have tracked this issue down to the following VLA allocation in the test_core_z
double complex test routine:
double complex X[n][n]; /* eigenvectors storage */
where n=400
so this tries to allocate a 400x400 double complex VLA. So it looks like we're getting a stack overflow, is the Windows stack just really tiny or something??
EDIT: according to my calculations the size of X is only 2.56 MB!
I checked a little bit online and it seems VLA could not be supported by default without the preprocessing flag __STDC_VLA__
.
I think it's actually the other way around:
https://stackoverflow.com/questions/66246821/what-is-the-motivation-behind-stdc-negative-definitions-for-example-stdc-no-v
So what you're saying is that on Windows MinGW defines __STDC_NO_VLA__
? I find that hard to believe...
VLAs are not supported by MSVC so it could explain that gcc on Windows defines it.
Is it not possible to remove VLAs?
https://en.m.wikipedia.org/wiki/Variable-length_array