amosnier/sha-2

Large messages unusable (too big for stack allocation)

Opened this issue · 10 comments

I couldn't compile for some time... Took me a few minutes to realize that line 44 in test.c with define
#define LARGE_MESSAGES 1
won't really work on Windows with its 1mb stack limit... After changing it to 0, everything compiled as it should.

It should be noted somewhere in ReadMe or made compile-time specific for Windows (don't allow it at all or something).
Other solution would be to make it allocate on heap at run-time.

EDIT: I quite misread the actual sizes. 1gb on stack? Wow, that is some giant stack...
These should definitely be heap-allocated!

I wonder if you interpret what you see in the correct way. Do you actually mean that "compiling" seems to take "for ever"? If you use my Makefile, that is not surprising at all. Since I use Travis in the simplest possible way, I just run the tests as part of the make instructions. On Travis or on my machine, since there are several Gigabytes of test data, that takes a few minutes to run! You can adapt the Makefile to whatever you like (as I explicitly mention in the README file).

If I misunderstand your issue, you will have to tell me more. The large data is in the BSS segment. I had a quick look at the code again, and I do not believe that it ever ends up on the stack. Only data pointers are passed to the various functions. Also, the current max stack size on my machine is 8 MB I believe, and I have no problem running make on my SHA-2 project (but it takes a few minutes to complete with the large messages).

Should I close this issue?

No, it doesn't want to build when large messages are enabled.
If it helps, I'm currently building on Windows 10, 64-bit, latest updates installed on MSYS2's MinGW compiler (also with latest packages and toolsets available), utilizing GCC 7.3.0 and compiling for 64-bit arch.

It fails in linking stage due to it being enabled.
Maybe I'm misinterpreting what is the actual issue here as you wrote, could be some compiler-specific (or better said linker-specific) problem here.

Anyway, without modifying code to allocate those arrays on heap at run-time or disabling that chunk alltogether, I'm unable to compile.

So what is the linker error message?

$ make
cc -Wall -Wextra -Wpedantic   -c -o test.o test.c
cc -Wall -Wextra -Wpedantic   -c -o sha-256.o sha-256.c
cc   test.o sha-256.o   -o test
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: in function `__tmainCRTStartup':
C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:251:(.text+0x1cd): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp_Sleep' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libkernel32.a(dkhchs01360.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:278:(.text+0x255): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp_SetUnhandledExceptionFilter' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libkernel32.a(dkhchs01346.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:286:(.text+0x283): relocation truncated to fit: R_X86_64_PC32 against symbol `__mingw_winmain_hInstance' defined in COMMON section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:312:(.text+0x2f3): relocation truncated to fit: R_X86_64_PC32 against symbol `__mingw_winmain_lpCmdLine' defined in COMMON section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:238:(.text+0x475): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp_GetStartupInfoA' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libkernel32.a(dkhchs00721.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-gccmain.o): in function `__main':
C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:74:(.text+0xb2): relocation truncated to fit: R_X86_64_PC32 against `.bss'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:76:(.text+0xc2): relocation truncated to fit: R_X86_64_PC32 against `.bss'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-charmax.o): in function `my_lconv_init':
C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/charmax.c:19:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp___lconv_init' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmsvcrt.a(dkxcbs00090.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-gs_support.o): in function `__security_init_cookie':
C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gs_support.c:62:(.text+0x47): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp_GetSystemTimeAsFileTime' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libkernel32.a(dkhchs00746.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gs_support.c:70:(.text+0x52): relocation truncated to fit: R_X86_64_PC32 against symbol `__imp_GetCurrentProcessId' defined in .idata$5 section in C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libkernel32.a(dkhchs00536.o)
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gs_support.c:71:(.text+0x5b): additional relocation overflows omitted from the output
collect2.exe: error: ld returned 1 exit status
make: *** [<builtin>: test] Error 1```

Interesting. Thanks for reporting it. I have pushed a workaround. I would be interested to hear whether it works for you. The workaround is to use heap, as you proposed.

Please note that your error messages seem to confirm that the problem is with the BSS segment, not the stack. I guess a stack size problem would lead to a stack overflow at runtime, not a linking error.

I can't say that I fully understand the issue, but my feeling is that it's a linker limitation that might even qualify as a bug. Large BSS, when possible, is superior to large heap in my opinion. Anyway, sometimes one has to compromise...

Looking forward to your feedback. Out of curiosity, what do you use this SHA-256 implementation for?

After a little more thinking, my statement about BSS vs heap is an oversimplification. It depends, of course on the use case. For this test program, allocating the test data statically once for all made sense.

For many other use cases, heap allocation, or even better stack allocation when the data is not too large, is better, since it allows a minimization of data scope (create, use, destroy - i.e. better encapsulation, which limits the risks of unwanted side effects etc.).

Still, the linker should not dictate such choices.

I close this issue for now. It can apparently be reopened if needed.

I'm currently creating pure C library, primarily for game development. I have plan to build some small engine around it later as part of my bachelor's degree.

I needed some hash function and as it is not my specialization, I was searching for some easy to understand implementation that worked well with my needs and which I could easily integrate into my library.

If you want credits, don't worry, they will be in the header of your modified files :)


Anyway, back to the point - it seems to work now as expected :)

If I may suggest one last thing, memset is better than simple for-loop for what you are doing in construct_binary_messages(). It is platform-optimized in most compilers, thus should perform better or at least equal to simple for-loop which you wrote :)

I do not expect any acknowledgment. When I say public domain, I mean it. I was just curious.

About memset(), although we are still talking about the test code, which is not performance critical, I agree with you. Since the initialization happens to fill in arrays with constant uint8_t, which have the same size as unsigned char (this is always true as long as uint8_t exists), memset() is better. I have made the update and pushed. Good luck with your library.

Well, you'll get acknowledgment anyway :P
I always give credit where credit is due ;)

BTW, if you want, you can have look at this:
https://github.com/uzyfeo/sha-2/commit/21510159d27f18aa456a6f81dbd9a6199ea2178b

I did this in my library hash_test.c (well, it looked more distinct than this... I back-ported it mostly to be conformant with your style). I know that in this case, test performance is not that critical, However, as I was also testing speed and not just correctness (due to my own modifications, I needed to compare it with yours if I was not back-pedaling with some changes), I needed the code to be as fast as possible and have smallest possible margin of error.

My speed test was basically your test minus printing and hash->string conversion, finding out how long it took to compute 1,000,000 hashes. These changes being much more cache friendly (in theory at least) should prove to be just that tiny bit faster, which is always good when testing for performance. Also, it leaves smaller margin for error, as all data is always packed in same place and not fragmented in memory (although in quite giant blob... should not be a big issue though).