GCC 6.2.1 breaks libmill
raedwulf opened this issue ยท 17 comments
Accidentally pressed enter when creating this issue... so you might have an empty notification!
I compiled libmill and libdill with GCC 6.2.1 and it appears the coroutines no longer work in both (it segfaults).
I've only investigated so far to see if it was only my assembly context switching that was the root of the problem and it is not. It occurs with the sigsetjmp/longjmp implementation as well - suggesting it might be something else...
I've narrowed down the issue:
#include <assert.h>
#include <stdio.h>
#include "libdill.h"
coroutine void worker(int count, const char *text) {
int i;
for(i = 0; i != count; ++i) {
printf("%s\n", text);
int rc = msleep(now() + 10);
assert(rc == 0);
}
}
int main() {
int cr1;
{
sigjmp_buf *ctx;
int h = dill_prologue(&ctx);
if(h >= 0) {
if(!sigsetjmp(*ctx, 0)) {
int dill_anchor[dill_unoptimisable1];
dill_unoptimisable2 = &dill_anchor;
volatile size_t sz = (char*)&dill_anchor - (char*)hdata(h, NULL);
char dill_filler[sz]; // this line breaks because it now tries to forcefully build the stack this size!
dill_unoptimisable2 = dill_filler;
worker(4, "a ");
dill_epilogue();
}
}
cr1 = h;
}
hclose(cr1);
return 0;
}
I've figured out what GCC 6.2.1 introduces that breaks the "black magic". In older versions of GCC, the -fstack-protector
does not cover all cases. They have fixed it in 6.2.1 and so it detects when the stack allocated by char dill_filler[sz];
will overflow the stack.
A workaround is to pass -fno-stack-protector
in CFLAGS
. However, I believe we should find a better solution to this as -fstack-protector
is a common default (how common, I'm unsure).
If you want to try it yourself, this is the smallest case I could come up with which works with GCC 4.9.3 and 5.4.0 but not with 6.2.1.
#include "libdill.h"
coroutine void hello() {
return;
}
int main() {
int cr1;
{
sigjmp_buf *ctx;
int h = dill_prologue(&ctx);
if(h >= 0) {
if(!sigsetjmp(*ctx, 0)) {
int dill_anchor[dill_unoptimisable1];
dill_unoptimisable2 = &dill_anchor;
volatile size_t sz = (char*)&dill_anchor - (char*)hdata(h, NULL);
char dill_filler[sz];
dill_unoptimisable2 = dill_filler;
hello();
dill_epilogue();
}
}
cr1 = h;
}
hclose(cr1);
return 0;
}
Do you have any idea what can be possibly done except of turning off stack protection? It seems to me that we are doing exactly what stack protection is supposed to prevent so there's little choice but to disable it.
My only solution in mind is inline assembly but it has the downside that it would need to be written per architecture. Making it work with -fstack-protector is desirable because we do want stack protection elsewhere as one of the main use case is for networking. I can do x86(_64) and arm (will need to setup my various SBCs and learn ARM assembly) and might be able to do powerpc (on os x) if my imac g5 still boots (dodgy hdd). I'm unable to test modern OS X. I should be able to test DragonflyBSD but I'm having difficulty with the onboard LAN card (then again I could set all the BSDs up in qemu if necessary).
I've been working on a solution... and sad to say, but there's no easily-compatible solution.
The closest I arrived at was:
#define go(fn) \
({\
sigjmp_buf *ctx;\
int h = dill_prologue(&ctx);\
if(h >= 0) {\
if(!dill_setjmp(*ctx)) {\
asm volatile("leaq -8(%%rax), %%rsp":: "rax" (hquery(h, NULL)));\
fn;\
dill_epilogue();\
}\
}\
h;\
})
This almost works but parameters can be sometimes loaded in fn
before the inline assembly is executed. In which case "rsp" is used after the inline assembly to read local variables, which of, course, are invalid because the stack has been switched.
Short of a dynamic code generator, poorer syntax or reduced functionality will be introduced... sadly. C Macros are just not powerful enough.
Apart from sticking with -fno-stack-protector
, the only alternative I can see is to add mandatory first parameter to the coroutine to pass the stack pointer. The coroutine will then have to execute some code to switch to the correct stack before the rest of the coroutine is executed.
I think I might have a solution! If we can override __stack_chk_fail
with our own implementation, I think we might be on a winner. Then again that would introduce a lot of inefficiency.
Reference here
There's a compromise flag: -fstack-protector-explicit
which allows attributes to be passed to opt in for stack protection using __attribute__((stack_protect))
but there's no equivalent opt out.
Fixed it. the unoptimisable vars and VLAs were actually doing more than I thought they were. TLDR; the VLAs are necessary to coerce the compiler to always generate a stack frame (they are unimplementable without a stack frame). The stack frame lets fn
reference the local variables (EDIT: which store the function arguments) needed even when the stack pointer is changed.
This works on x86-64:
#define go(fn) \
({\
sigjmp_buf *ctx;\
int h = dill_prologue(&ctx);\
if(h >= 0) {\
if(!dill_setjmp(*ctx)) {\
int dill_anchor[dill_unoptimisable1];\
dill_unoptimisable2 = &dill_anchor;\
asm volatile("leaq -8(%%rax), %%rsp"::"rax"(hquery(h, NULL)));\
fn;\
dill_epilogue();\
}\
}\
h;\
})
When I have time, I'll port the assembly one-liners for each platform (it literally just writes to the stack pointer so that the stack protector ignores it).
Ha, I wrote the code myself and only now I understand how it really works!
When making a patch, please add a comment about what the assembly line does so that people looking at the code won't have to scratch their heads over it.
Also, a fallback to -fno-stack-protector would be nice.
Yes, I have that in mind already :) and a comment saying it breaks if you build your application without -fno-stack-protector
There is a more stable alternative to this - libcoro and lwan do coroutines through having a fixed, single, argument for each coroutine and a stack pointer that is passed through arguments. However, we lose the beauty of being able to simply pass a nested function call-style interface (although something similar could potentially be achieved through variadic macros).
I think that's an easy trade-off. Stack protector may be nice to have but it is not a critical feature. A sane way to invoke coroutines, on the other hand, is huge improvement to usability/readability and I'd prefer it over stack protection any time. (Not even mentioning that you've apparently found a way to have both at the same time.)
Some of the code I came up with in one of my attempts was mostly acceptable. With some extreme macro-wizardry it might be possible to wrap libcoro's style coroutines. I'd leave this for now, but in case GCC >7.x decides to do something that breaks the current technique, this is a potential avenue to explore.
#define dill_noinl __attribute__((noinline))
#define dill_naked __attribute__((optimize("omit-frame-pointer")))
#define dill_wrap dill_noinl dill_naked
#define dill_coro dill_noinl dill_naked
#define dill_loadsp(fn, ...) dill_wrap void fn(void *s, __VA_ARGS__)\
{asm("leaq 0(%%rax), %%rsp\njmp " #fn "_body__":: "rax" (s));}
#define coroutine(fn, ...) \
dill_coro void fn##_body__(void *s, __VA_ARGS__);\
dill_loadsp(fn, __VA_ARGS__)\
dill_coro void fn##_body__(void *s, __VA_ARGS__)
/* Statement expressions are a gcc-ism but they are also supported by clang.
Given that there's no other way to do this, screw other compilers for now.
See https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Statement-Exprs.html */
#define go(fn, ...) \
({\
sigjmp_buf *ctx;\
int h = dill_prologue(&ctx);\
if(h >= 0) {\
if(!dill_setjmp(*ctx)) {\
fn(hquery(h, NULL), __VA_ARGS__);\
dill_epilogue();\
}\
}\
h;\
})
Which leads to an interface:
coroutine(worker, int fd) {
int rc = fdin(fd, -1);
assert(rc == 0);
event = 1;
}
int h = go(worker, fd);
P.S. This technique was promising, but haven't managed to get it to work yet - I think if I use the VLA trick from earlier, I can force the stack frame to be generated consistently. I've only managed to get this to work on -O0 so far and occasionally on earlier GCC versions.
Thinking in long-term I have a better proposal: Let's infiltrate the C standardization committee and make go() part of the language. That would force compiler people to support that kind of thing in some way.
For what it's worth, it is possible to disable stack protector on a per function basis (it was mentioned above that it isn't). The trick is:
__attribute__((optimize("no-stack-protector")))
before the function definition (works from gcc 4.4 onwards or so). Mentioning it here since it may be useful to you and also because this issue appears when you search for per-function stack protection issues.
Thanks, I'll look into it!