cmd/compile, runtime: investigate Windows stack overflows calling into system C libraries
kjk opened this issue · 60 comments
This came up with 1.8.3 when working on GUI Windows programs using lxn\walk library.
Relevant issues:
The problem is that Go linker sets very small stack size (I think 128 kB) in PE executables. The standard on Windows is more like 1-2 MB.
This is fine for code that only lightly uses system C libraries but when writing code that talks to win32 UI APIs, it's very likely to hit the stack limit and silently crash.
I encountered it because I tried to use webview (mshtml.dll) control and it crashed on 64-bit when rendering my (not very complicated) website. Other people seen such crashes as well.
There are work-arounds: one can edit PE header after the exe is built using e.g. editbin.exe
, but such tool is not necessarily available to the programmer (it's part of Visual Studio).
It would be much easier on Windows developers if there was a linker flag to set custom stack size so that it could be done directly with go build
. A library like lxn\walk could then document the need to increase stack size for Go Windows programs and recommend a
I don't know if such option would be relevant/needed for other OSes/exe formats.
Stack sizes isn't a knob programmers should think about in Go. That's why Go stacks dynamically split or adjust in size as needed.
When making a call to code not managed by Go's dynamic stack size checks, Go should be switching to a conservatively-sized stack. Maybe there's a bug on how that works on Windows.
For what it's worth, here's a callstack of the crash. It's on main thread:
.. and 0xe4 more stack frames likes this
e5 00000000`00088dd0 00007ffd`5eaacfcb MSHTML!CFormatInfo::FindFormattingParent+0x43a
e6 00000000`00088e20 00007ffd`5e99a90f MSHTML!CElement::ComputeFormatsVirtual+0x12b
e7 00000000`000897c0 00007ffd`5eab0f0a MSHTML!CElement::ComputeFormats+0x14f
e8 00000000`000898f0 00007ffd`5eaacfcb MSHTML!CFormatInfo::FindFormattingParent+0x43a
e9 00000000`00089940 00007ffd`5e99a90f MSHTML!CElement::ComputeFormatsVirtual+0x12b
ea 00000000`0008a2e0 00007ffd`5e999b61 MSHTML!CElement::ComputeFormats+0x14f
eb 00000000`0008a410 00007ffd`5e999765 MSHTML!CTreeNode::ComputeFormats+0x81
ec 00000000`0008a450 00007ffd`5e9ccde2 MSHTML!CTreeNode::ComputeFormatsHelper+0x35
ed 00000000`0008b200 00007ffd`5e9cde27 MSHTML!CTreeNode::GetCharFormatHelper+0xe
ee 00000000`0008b230 00007ffd`5ed6f105 MSHTML!CTreeNode::GetCharFormat+0x77
ef 00000000`0008b260 00007ffd`5ecd5226 MSHTML!CRecalcLinePtr::CalcAfterSpace+0x135
f0 00000000`0008b310 00007ffd`5ecc4c0e MSHTML!CRecalcLinePtr::MeasureLine+0x382
f1 00000000`0008b480 00007ffd`5ecc3375 MSHTML!CDisplay::RecalcLinesWithMeasurer+0x2ce
f2 00000000`0008b5e0 00007ffd`5ecda3d0 MSHTML!CDisplay::RecalcLines+0x65
f3 00000000`0008b820 00007ffd`5ecd7182 MSHTML!CDisplay::RecalcView+0x54
f4 00000000`0008b860 00007ffd`5ecd8196 MSHTML!CFlowLayout::CalcTextSize+0x302
f5 00000000`0008b9d0 00007ffd`5eddf0b5 MSHTML!CFlowLayout::CalcSizeCoreCSS1Strict+0xcc6
f6 00000000`0008bdc0 00007ffd`5ed4c2b4 MSHTML!CBodyLayout::CalcSizeCore+0x75
f7 00000000`0008be50 00007ffd`5ecc8af2 MSHTML!CFlowLayout::CalcSizeVirtual+0x84
f8 00000000`0008bee0 00007ffd`5ed48f67 MSHTML!CLayout::CalcSize+0x1da
f9 00000000`0008c080 00007ffd`5ecc8af2 MSHTML!CHtmlLayout::CalcSizeVirtual+0x9e7
fa 00000000`0008c3f0 00007ffd`5edf3efe MSHTML!CLayout::CalcSize+0x1da
fb 00000000`0008c590 00007ffd`5eae760f MSHTML!CLayout::CalcTopLayoutSize+0x5e
fc 00000000`0008c640 00007ffd`5ea3214d MSHTML!CView::EnsureSize+0xe7
fd 00000000`0008c680 00007ffd`5eae9e97 MSHTML!CView::EnsureView+0x43d
fe 00000000`0008c7a0 00007ffd`5ec60a6f MSHTML!CDoc::RunningToInPlace+0x1b7
ff 00000000`0008c8c0 00007ffd`5ec60754 MSHTML!CServer::TransitionTo+0xeb
100 00000000`0008c900 00007ffd`6d886ada MSHTML!CServer::Show+0x64
101 00000000`0008c930 00007ffd`6d884ce8 ieframe!CDocObjectHost::_ShowMsoView+0xba
102 00000000`0008c960 00007ffd`5eaebd4e ieframe!CDocObjectHost::ActivateMe+0x38
103 00000000`0008c990 00007ffd`5eaebcc7 MSHTML!CServer::ActivateView+0x72
104 00000000`0008c9c0 00007ffd`5ec6068f MSHTML!CServer::DoUIActivate+0x27
105 00000000`0008ca20 00007ffd`5ed682b5 MSHTML!CServer::DoVerb+0xaf
106 00000000`0008ca80 00007ffd`6d88681d MSHTML!CMarkup::Navigate+0x61
107 00000000`0008caf0 00007ffd`6d886dc0 ieframe!CDocObjectHost::_ActivateMsoView+0x91
108 00000000`0008cb90 00007ffd`6d888912 ieframe!CDocObjectHost::UIActivate+0x78
109 00000000`0008cbc0 00007ffd`6d8b0ad6 ieframe!CDocObjectView::UIActivate+0x22
10a 00000000`0008cbf0 00007ffd`6d8b21c1 ieframe!CBaseBrowser2::_UIActivateView+0xca
10b 00000000`0008cc20 00007ffd`6daa60c9 ieframe!CBaseBrowser2::v_ActivatePendingView+0x291
10c 00000000`0008ed70 00007ffd`6d9b82cb ieframe!CWebBrowserSB::v_ActivatePendingView+0x19
10d 00000000`0008eda0 00007ffd`6d8b2666 ieframe!`Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create'::`2'::`dynamic atexit destructor for 'moduleSingleton''+0x26e0b
10e 00000000`0008ee10 00007ffd`6daa1cf3 ieframe!CBaseBrowser2::Exec+0xd6
10f 00000000`0008ee70 00007ffd`6d9a6217 ieframe!CWebBrowserSB::Exec+0xd3
110 00000000`0008eef0 00007ffd`6d88273a ieframe!`Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create'::`2'::`dynamic atexit destructor for 'moduleSingleton''+0x14d570
111 00000000`0008ef60 00007ffd`6d8825a5 ieframe!CDocObjectHost::_OnReadyState+0x152
112 00000000`0008f1f0 00007ffd`6d8824c7 ieframe!CDocObjectHost::_OnChangedReadyState+0xd1
113 00000000`0008f2c0 00007ffd`5eba97d6 ieframe!CDocObjectHost::OnChanged+0x17
114 00000000`0008f2f0 00007ffd`5eba916d MSHTML!CBase::FirePropertyNotify+0x2f6
115 00000000`0008f390 00007ffd`5ead3cdc MSHTML!CMarkup::SetReadyState+0xfd
116 00000000`0008f3e0 00007ffd`5e960009 MSHTML!CMarkup::SetInteractiveInternal+0x36c
117 00000000`0008f720 00007ffd`5ead169d MSHTML!CMarkup::RequestReadystateInteractive+0xb5
118 00000000`0008f780 00007ffd`5ebbf9cc MSHTML!CMarkup::BlockScriptExecutionHelper+0x129
119 00000000`0008f7d0 00007ffd`5ead1abe MSHTML!CHtmPost::Exec+0x5dc
11a 00000000`0008f9d0 00007ffd`5ead199f MSHTML!CHtmPost::Run+0x32
11b 00000000`0008fa00 00007ffd`5ead185d MSHTML!PostManExecute+0x63
11c 00000000`0008fa40 00007ffd`5eb86780 MSHTML!PostManResume+0xa1
11d 00000000`0008fa80 00007ffd`5eb7042c MSHTML!CHtmPost::OnDwnChanCallback+0x40
11e 00000000`0008fad0 00007ffd`5ea97b71 MSHTML!CDwnChan::OnMethodCall+0x1c
11f 00000000`0008fb00 00007ffd`5ea90b39 MSHTML!GlobalWndOnMethodCall+0x251
120 00000000`0008fbb0 00007ffd`9a11bc50 MSHTML!GlobalWndProc+0xf9
121 00000000`0008fc40 00007ffd`9a11b5cf USER32!UserCallWinProcCheckWow+0x280
122 00000000`0008fda0 00000000`0045163e USER32!DispatchMessageWorker+0x19f
123 00000000`0008fe20 000000c0`42098d58 FindFilesGo+0x5163e
124 00000000`0008fe28 00000000`00790ca8 0x000000c0`42098d58
125 00000000`0008fe30 000000c0`42098cf0 FindFilesGo+0x390ca8
126 00000000`0008fe38 000000c0`42098cf0 0x000000c0`42098cf0
127 00000000`0008fe40 00000000`009d5be8 0x000000c0`42098cf0
128 00000000`0008fe48 00000000`00000000 0x9d5be8
Go code doing the windows message pump https://github.com/lxn/walk/blob/master/form.go#L336
For easy reproduction, I've created https://github.com/kjk/go20975
Stack sizes isn't a knob programmers should think about in Go.
The problem isn't in Go code.
The problem is that Go linker sets very small stack size (I think 128 kB) in PE executables. The standard on Windows is more like 1-2 MB.
Based on my reading of the code, the stack should be set to 2MB in cgo binaries. It's only 128KB in pure Go binaries. Could you double check what the PE header says?
0:000> !dh 400000 -f
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
8664 machine (X64)
C number of sections
0 time date stamp
4D4400 file pointer to symbol table
2126 number of symbols
F0 size of optional header
223 characteristics
Relocations stripped
Executable
App can handle >2gb addresses
Debug information stripped
OPTIONAL HEADER VALUES
20B magic #
3.00 linker version
32FE00 size of code
1B600 size of initialized data
0 size of uninitialized data
51440 address of entry point
1000 base of code
----- new -----
0000000000400000 image base
1000 section alignment
200 file alignment
3 subsystem (Windows CUI)
4.00 operating system version
1.00 image version
4.00 subsystem version
56B000 size of image
600 size of headers
0 checksum
0000000000020000 size of stack reserve
0000000000001000 size of stack commit
0000000000100000 size of heap reserve
0000000000001000 size of heap commit
0x20000
== 131,072 bytes.
The thing is, this code doesn't use cgo. All calls to win libraries are via syscall.
The discussion in lxn/walk#261 claims that adding a dummy import _ "runtime/cgo"
to the program would force the large stack but it doesn't seem to be the case in my testing. Maybe that changed sometime before 1.8.3.
Scratch that, it probably would have worked if I had gcc
installed:
C:\Go\pkg\tool\windows_amd64\link.exe: running gcc failed: exec: "gcc": executable file not found in %PATH%
I had quick look at how we set reserved stack size.
For Cgo program, it is 1M for 386 and 2M for amd64.
Search for oh64.SizeOfStackReserve and oh.SizeOfStackReserve in cmd/link/internal/ld/pe.go for first thread. All other threads start with
_beginthread(..., 0, ...)
in runtime/cgo/gcc_windows_386.c and runtime/cgo/gcc_windows_amd64.c, and _beginthread defaults to what first thread does.
For non-Cgo program, it is 128K.
Search for oh64.SizeOfStackReserve and oh.SizeOfStackReserve in cmd/link/internal/ld/pe.go for first thread. All other threads start with
stdcall6(_CreateThread, ..., 0x20000, ..., _STACK_SIZE_PARAM_IS_A_RESERVATION, ...)
So Cgo programs run with standard (by Windows standards) stacks. But non-Cgo programs run with small stacks. CL 2237 doubled the stack size (from 64K to 128K) already. @kjk can you, please, try make it 256K for non-Cgo - see if it fixes your problem. Maybe we can increase stack size again. I don't think it matters on amd64, and we have less and less of 386 computers.
Thank you
Alex
I tried 256kb, 512kb, 1mb, 2mb. My websites stops crashing at 512mb. Other websites need more. My vote would be for 2MB on win/64bit.
My websites stops crashing at 512mb.
I was hoping it would work with 256K.
My vote would be for 2MB on win/64bit.
What about 386? Should we make stacks different on 386 and amd64?
Alex
For 386 the standard seems to be 1 MB.
For 386 the standard seems to be 1 MB.
This will only allow Go program to start about 1000 threads on 386. See my comment https://go-review.googlesource.com/#/c/2237/2//COMMIT_MSG@18 Is that acceptable?
Alex
1000 real OS threads on 32 bit windows should be more than enough for anybody. Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.
It's really mostly about main GUI thread, which is the thread pumping the message loop which ends up calling wndproc of windows and the code called from there.
That's the main thread that is created outside of control of Go runtime and uses the stack size from PE header.
I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.
1000 real OS threads on 32 bit windows should be more than enough for anybody.
It is plenty for me. But other people might think differently.
Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.
I really do not know. I will let @aclements decide.
It's really mostly about main GUI thread, which is the thread pumping the message loop which ends up calling wndproc of windows and the code called from there.
That's the main thread that is created outside of control of Go runtime and uses the stack size from PE header.
GUI does not have to run on the main thread. You can even run multiple GUI threads.
I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.
I think these different thread sizes for different environments are already complicated enough. I do not want to make things even more complicated.
Alex
Can you explain how a "system call" wound up deep in the MSHTML library? (Every time I think I'm starting to get a feel for Windows there's something even more mystifying...) Skimming over lxn, it looks like the Go syscall package has provided enough rope to actually call into arbitrary Windows libraries without going through cgo, but I'm still not sure how that wound up in MSHTML...
Maybe a call to syscall.LoadLibrary
(or syscall.GetProcAddress
?) from outside std should trigger larger stacks?
Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads.
I wish this were true. :) There's (basically) no point in launching more than NUMCPU compute threads, but threads are also the unit of blocking for many operations, including many system calls and cgo calls. We generally do a good job of mapping blocking IO on to asynchronous system calls, but other things consume extra threads. So it's really NUMCPU + the number of things that can be blocked or in cgo simultaneously.
I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack.
Unless you're calling runtime.LockOSThread
from init
(maybe you are), Go makes no distinction between the "main" thread and other threads created later. It's just a big thread pool to the runtime.
Can you explain how a "system call" wound up deep in the MSHTML library?
Please see the callstack in #20975 (comment)
The structure of Window GUI app is:
- you create your windows
- you run the messsage pump
Message pump is:
for fb.hWnd != 0 {
switch win.GetMessage(&msg, 0, 0, 0) {
case 0:
return int(msg.WParam)
case -1:
return -1
}
if !win.IsDialogMessage(fb.hWnd, &msg) {
win.TranslateMessage(&msg)
win.DispatchMessage(&msg)
}
}
DispatchMessage is a syscall:
func DispatchMessage(msg *MSG) uintptr {
ret, _, _ := syscall.Syscall(dispatchMessage, 1,
uintptr(unsafe.Pointer(msg)),
0,
0)
return ret
}
This triggers arbitrarily deep processing inside windows code. In this case mshtml control was told to display a website, so, after downloading html, it started to parse and render it and that happened on main GUI thread, as part of processing a message that was triggered with DispatchMessage
.
Maybe a call to syscall.LoadLibrary (or syscall.GetProcAddress?) from outside std should trigger larger stacks?
LoadLibrary loads DLL from a file.
GetProcAddress finds a function (by name) inside of DLL that you just loaded into memory.
Once you have access to the function (address of it), and you know what parameters it takes, you can call it.
All these functions, obviously, assume you use standard Windows stack. Simple functions use very little stack, but complicated might use a lot.
Microsoft provide a lot of DLLs packed with different functions that do different things for you.
I have not used MSHTML DLL myself, but, from what I understand, it can display full browser inside of your GUI application. Not surprising it would use a lot of stack.
But I would say MSHTML DLL is probably extreme case here.
Alex
@kjk, thanks for the explanation. It's interesting that message loops haven't changed since I last wrote Windows GUI code 15+ years ago. :) I take it some other part of your application set up the MSHTML control, again without using any cgo?
Once you have access to the function (address of it), and you know what parameters it takes, you can call it.
Right. My point is that we're now well outside the realm of what's really considered a "system call". I realize the distinction is blurry on Windows, but I assert that all of the "system calls" exposed by the std syscall package on Windows use very little stack and then enter the kernel (or use clever tricks to stack in user space, but still use very little stack). The stack size is set based on this assumption.
The lxn package may be using syscall.Syscall
, but these really aren't system calls in the traditional sense, and they clearly violate the stack assumption. This was intended to be the realm of cgo calls, but the syscall package gives you enough power to make these calls without cgo on Windows.
That suggests that we need to either always use 2MB stacks on Windows, or we need to do something cleverer to detect if you're escaping into the deep C world, such as observing LoadLibrary
calls from outside std.
/cc @randall77
I note that the linker already changes its behavior based on whether reflect.Value.Call
is linked into the program. We could similarly detect syscall.LoadDLL
.
I take it some other part of your application set up the MSHTML control, again without using any cgo?
Correct. The MSHTML control is just one of many used to build GUI windows. So you create a window, then you add controls to it (buttons, edit controls, list and others) - MSHTML control is just another control. You do all that by calling Windows APIs. I can give you some links, if you are interested, but these are just functions that are loaded from DLLs. There are some some complications, like for example, all GUI code must run on single thread (syscall.LockThread) and it uses callbacks (syscall.NewCallback), but Go program is no different from others that use theses DLLs.
but I assert that all of the "system calls" exposed by the std syscall package on Windows use very little stack and then enter the kernel (or use clever tricks to stack in user space, but still use very little stack). The stack size is set based on this assumption
On Windows every thread (including first) starts with standard Windows stack. All Windows syscalls switch to that stack before calling DLL functions. These DLL functions are called in user space. Some of them switch to kernel space, but some (for example if you build C DLL yourself) do not switch to kernel space.
they clearly violate the stack assumption
I do not see what is violated.
to either always use 2MB stacks on Windows
We can do that. @kjk program stops crashing at 512kb, so we can go with that instead.
Alex
I don't think 512 kb is enough. It's where the simplest program stopped crashing on the simplest website. I haven't done much testing beyond that but it stands to reason other websites (or other programs) might require more than 512 kb.
I think the value should be what the standard for windows linker is i.e. 1 MB (https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx), which seems to be the same for 32bit and 64bit (https://blogs.technet.microsoft.com/markrussinovich/2009/07/05/pushing-the-limits-of-windows-processes-and-threads/).
Less than that and Go code will run into this crash more often than C code.
And I would still like the linker option to bump it to larger value if I determine that my particular application needs that, especially on 64bit where larger reserved stack has essentially no penalty because address space is aplenty.
I get the desire that make it "just work" but you can't in this particular case.
I don't think 512 kb is enough. It's where the simplest program stopped crashing on the simplest website. I haven't done much testing beyond that but it stands to reason other websites (or other programs) might require more than 512 kb.
Please, see if you can break Go with 512kb stack.
Thank you.
Alex
There is no doubt that I can. I'll just allocate more than 512 kb on the stack.
I broke 256 kb on literally one of the first real programs I wrote that used mshtml. There's infinite number of websites and infinite number of programs that use other potentially stack hungry libraries.
If I'm intentional about it then I'll break any limit. If I'm not intentional and just writing random windows programs then I have infinite search space.
The question is: what is a reasonably safe value.
And the answer is: what everyone else is using i.e. 1 MB.
Those are the constraints under which Window OS developers and other WIndows developers operate. If they write code that uses more than 1 MB of stack, their programs will crash, so they have incentive to stay under that limit.
The lower the limit the higher probability that Go programs will crash because they'll hit the limit.
I'll also note that there might more bad going on than just stack size limit. When I use the repro program I linked to and visit https://vox.com, it crashes even if I set very large limit, like 2 MB. Sometimes it prints "fatal: more gc" sometimes it doesn't and it crashes it a very bad way (not an orderly panic that prints a crashing callstack but a vanishing act where process is wiped out without a trace).
Unfortunately delve doesn't debug this kind of code at all and windbg doesn't understand Go symbols and I know nothing about debugging runtime so that's as deep as I can go on this.
But I do have a consistent repro of a bad crash.
I note that the linker already changes its behavior based on whether reflect.Value.Call is linked into the program. We could similarly detect syscall.LoadDLL.
@ianlancetaylor, I believe LoadDLL
will always be linked into Windows programs because it's used by the syscall
package itself. The linker would have to distinguish uses in syscall
from uses outside the package. Alternatively, we could refactor syscall
a bit so it uses some internal function. That's slightly tricky because it's a few layers deep, but might be easier than figuring out what calls it in the linker.
(syscall.LoadLibrary
doesn't have this problem since we already have the exported/internal split.)
they clearly violate the stack assumption
I do not see what is violated.
@alexbrainman, I think we're talking past each other, since the violation I'm talking about is what this whole issue is about. The runtime assumes anything called a "syscall" will not consume more than 128kB of user-space stack. I believe that is true of everything exported by the syscall
package itself. @kjk's application makes a "syscall" that overflows the 128kB stack, hence violating the runtime's assumption about stack usage.
And the answer is: what everyone else is using i.e. 1 MB.
Those are the constraints under which Window OS developers and other WIndows developers operate. If they write code that uses more than 1 MB of stack, their programs will crash, so they have incentive to stay under that limit.
@kjk, I think this is an excellent reason to raise the size to the default when there's any possibility of a call to regular Windows code (which we clearly need to be more conservative about). Maybe we should just always do this on 64-bit. But it also suggests that a linker flag to set it manually may be overkill? Unless it's common to build Windows applications with larger stacks?
I'll also note that there might more bad going on than just stack size limit.
What if you try setting it to something significantly larger? Say, 32MB?
@aclements Ah, sorry, I missed the path from NewLazyDLL
to LoadDLL
and the calls to NewLazyDLL
in syscall/zsyscall_windows.go. Refactoring does look somewhat painful, but while I can think of worse solutions I can't think of a better one.
It's crashing even with 16mb of reserved stack on https://vox.com
It prints "fatal: morestack on g0" and it crashes at:
(3508.2d50): Break instruction exception - code 80000003 (first chance)
*** WARNING: Unable to verify timestamp for C:\Users\kjk\src\go\src\github.com\kjk\go20975\go20975.exe
*** ERROR: Module load completed but symbols could not be loaded for C:\Users\kjk\src\go\src\github.com\kjk\go20975\go20975.exe
go20975+0x4dac6:
00000000`0044dac6 03488b add ecx,dword ptr [rax-75h] ds:ffffffff`ffffffa2=????????
0:000> kb
# RetAddr : Args to Child : Call Site
00 00000000`0044cf6a : 00000000`0044da5e 00000000`0042de20 00000000`0196fef0 00000000`1a9a5e40 : go20975+0x4dac6
01 00000000`0044da5e : 00000000`0042de20 00000000`0196fef0 00000000`1a9a5e40 00000000`01961409 : go20975+0x4cf6a
02 00000000`0042de20 : 00000000`0196fef0 00000000`1a9a5e40 00000000`01961409 00000000`00451834 : go20975+0x4da5e
03 00000000`0196fef0 : 00000000`1a9a5e40 00000000`01961409 00000000`00451834 00000000`00623d28 : go20975+0x2de20
04 00000000`1a9a5e40 : 00000000`01961409 00000000`00451834 00000000`00623d28 00000000`01961330 : 0x196fef0
05 00000000`01961409 : 00000000`00451834 00000000`00623d28 00000000`01961330 00000000`00000028 : 0x1a9a5e40
06 00000000`00451834 : 00000000`00623d28 00000000`01961330 00000000`00000028 00007ffd`5f888488 : 0x1961409
07 00000000`00623d28 : 00000000`01961330 00000000`00000028 00007ffd`5f888488 00000000`0000000b : go20975+0x51834
08 00000000`01961330 : 00000000`00000028 00007ffd`5f888488 00000000`0000000b 00000000`10530000 : go20975+0x223d28
09 00000000`00000028 : 00007ffd`5f888488 00000000`0000000b 00000000`10530000 00000000`00000000 : 0x1961330
0a 00007ffd`5f888488 : 00000000`0000000b 00000000`10530000 00000000`00000000 00000000`00000000 : 0x28
0b 00000000`0000000b : 00000000`10530000 00000000`00000000 00000000`00000000 00000000`01961409 : MSHTML!CUString::`vftable'
0c 00000000`10530000 : 00000000`00000000 00000000`00000000 00000000`01961409 00000000`01961550 : 0xb
0d 00000000`00000000 : 00000000`00000000 00000000`01961409 00000000`01961550 00000000`00000000 : 0x10530000
Note that this callstack is likely incomplete as it doesn't look right. I think it's windbg not showing it all the way.
At this point it's net yet fatal. when I run in the debugger (windbg) I can do g
to continue. It'll print "fatal: more stack on g0" and crash in the same place , with ever increasing callstack. After a couple of g
it looks like:
(3508.2d50): Break instruction exception - code 80000003 (first chance)
go20975+0x4dac6:
00000000`0044dac6 03488b add ecx,dword ptr [rax-75h] ds:ffffffff`ffffffa2=????????
0:000> kb
# RetAddr : Args to Child : Call Site
00 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
01 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
02 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
03 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
04 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
05 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
06 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
07 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
08 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
09 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
0a 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
0b 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
0c 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
0d 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
0e 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
0f 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
10 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
11 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
12 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
13 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
14 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
15 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
16 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
17 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
18 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
19 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
1a 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`0044cf6a : go20975+0x4dac6
1b 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`0044cf6a 00000000`0044da5e : go20975+0x39822
1c 00000000`00439822 : 00000000`0044dac6 00000000`0044cf6a 00000000`0044da5e 00000000`0042de20 : go20975+0x4dac6
1d 00000000`0044dac6 : 00000000`0044cf6a 00000000`0044da5e 00000000`0042de20 00000000`0196fef0 : go20975+0x39822
1e 00000000`0044cf6a : 00000000`0044da5e 00000000`0042de20 00000000`0196fef0 00000000`1a9a5e40 : go20975+0x4dac6
1f 00000000`0044da5e : 00000000`0042de20 00000000`0196fef0 00000000`1a9a5e40 00000000`01961409 : go20975+0x4cf6a
20 00000000`0042de20 : 00000000`0196fef0 00000000`1a9a5e40 00000000`01961409 00000000`00451834 : go20975+0x4da5e
21 00000000`0196fef0 : 00000000`1a9a5e40 00000000`01961409 00000000`00451834 00000000`00623d28 : go20975+0x2de20
22 00000000`1a9a5e40 : 00000000`01961409 00000000`00451834 00000000`00623d28 00000000`01961330 : 0x196fef0
23 00000000`01961409 : 00000000`00451834 00000000`00623d28 00000000`01961330 00000000`00000028 : 0x1a9a5e40
24 00000000`00451834 : 00000000`00623d28 00000000`01961330 00000000`00000028 00007ffd`5f888488 : 0x1961409
25 00000000`00623d28 : 00000000`01961330 00000000`00000028 00007ffd`5f888488 00000000`0000000b : go20975+0x51834
26 00000000`01961330 : 00000000`00000028 00007ffd`5f888488 00000000`0000000b 00000000`10530000 : go20975+0x223d28
27 00000000`00000028 : 00007ffd`5f888488 00000000`0000000b 00000000`10530000 00000000`00000000 : 0x1961330
28 00007ffd`5f888488 : 00000000`0000000b 00000000`10530000 00000000`00000000 00000000`00000000 : 0x28
29 00000000`0000000b : 00000000`10530000 00000000`00000000 00000000`00000000 00000000`01961409 : MSHTML!CUString::`vftable'
2a 00000000`10530000 : 00000000`00000000 00000000`00000000 00000000`01961409 00000000`01961550 : 0xb
2b 00000000`00000000 : 00000000`00000000 00000000`01961409 00000000`01961550 00000000`00000000 : 0x10530000
Eventually this growing stack will kill the process.
So it seems there are 2 issues:
- with 16 mb of reserved stack, it probably should never get into a state of "fatal: more stack g0".
- when it does get into that state, handling of that condition creates infinite recursive loop
Or maybe getting "fatal: more stack g0" is intended behavior it's just that logic to extend the stack on win 64 is buggy.
Also to show its 64bit, 16mb stack:
!dh 400000 -f
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
8664 machine (X64)
C number of sections
0 time date stamp
4D4400 file pointer to symbol table
2126 number of symbols
F0 size of optional header
223 characteristics
Relocations stripped
Executable
App can handle >2gb addresses
Debug information stripped
OPTIONAL HEADER VALUES
20B magic #
3.00 linker version
32FE00 size of code
1B600 size of initialized data
0 size of uninitialized data
51440 address of entry point
1000 base of code
----- new -----
0000000000400000 image base
1000 section alignment
200 file alignment
3 subsystem (Windows CUI)
4.00 operating system version
1.00 image version
4.00 subsystem version
56B000 size of image
600 size of headers
54DA03 checksum
0000000001000000 size of stack reserve
0000000001000000 size of stack commit
0000000000100000 size of heap reserve
0000000000001000 size of heap commit
0 DLL characteristics
0 [ 0] address [size] of Export Directory
4FF000 [ 44C] address [size] of Import Directory
56A000 [ 567] address [size] of Resource Directory
0 [ 0] address [size] of Exception Directory
0 [ 0] address [size] of Security Directory
0 [ 0] address [size] of Base Relocation Directory
0 [ 0] address [size] of Debug Directory
0 [ 0] address [size] of Description Directory
0 [ 0] address [size] of Special Directory
0 [ 0] address [size] of Thread Storage Directory
0 [ 0] address [size] of Load Configuration Directory
0 [ 0] address [size] of Bound Import Directory
331000 [ 130] address [size] of Import Address Table Directory
0 [ 0] address [size] of Delay Import Directory
0 [ 0] address [size] of COR20 Header Directory
0 [ 0] address [size] of Reserved Directory
0:002> ? 1000000
Evaluate expression: 16777216 = 00000000`01000000
Notice that this particular session was with 16mb of both reserved and committed stack. Should runtime ever need to extend the stack if it was already committed to the max?
@kjk, if 16MB isn't enough, are you sure there isn't something else going on here, like an infinite recursive?
Also, changing it in the executable header is only half of the problem. You also need to change the second argument to _CreateThread
in runtime/os_windows.go:newosproc
(currently 0x20000). If you only changed it in the linker, then it's quite possible you're still making these calls on a small stack.
Or maybe getting "fatal: more stack g0" is intended behavior it's just that logic to extend the stack on win 64 is buggy.
Go stacks and system stacks are two different things. Go code runs on Go stacks that can grow up to a basically arbitrary size and aren't affected by the stack size settings. C code runs on the system stack (also known as the g0 stack), which can't grow and is limited to the reserved size specified in the binary or in the call to CreateThread.
Here's a case in Cloud Foundry where a .NET executable was replaced with a golang executable and we started seeing StackoverflowExceptions. Bumping up the stack size of the golang executable to 1MB with editbin "fixed" the issue, or at least makes the executable behave the same as the old .NET exe.
I think defaulting to 1MB stack size on Windows makes sense from the perspective of compatibility with the Windows ecosystem.
@sneal, thanks for the referenced issue.
I think defaulting to 1MB stack size on Windows makes sense from the perspective of compatibility with the Windows ecosystem.
To be clear, we should absolutely be using larger stacks if there's any chance of calling into general C code. Our current condition for determining this is sufficient on other platforms, but not on Windows.
(I am a little confused on one point: Go uses 1MB stacks on 32-bit, but 2MB stacks on 64-bit if it thinks there's any C code. Is that atypical? Is the usual Windows default 1MB for 64-bit? What's the default for 32-bit?)
@alexbrainman, you seem opposed to unconditionally using larger stacks. I understand the concern on 32-bit, but are you opposed to always using larger stacks on 64-bit (if so, can you explain why)?
Reading over https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx, I see that the default stack size is 1MB (perhaps regardless of 32 vs 64-bit?). I don't know how we ended up with 2MB stacks.
CL https://golang.org/cl/49331 mentions this issue.
This should be fixed on windows/amd64 on master now (which will be released as Go 1.9 fairly soon).
For windows/386, I'm inclined to also make it always use 1MB stacks, but I think I'll wait until 1.10 opens because it's not obvious to me that that would be a low-risk change because of the more limited address space. We could do the more sophisticated heuristics with detecting applications that load libraries I was talking about earlier, but it's not obvious to me that's worth the complexity.
CL https://golang.org/cl/49610 mentions this issue.
We could do the more sophisticated heuristics with detecting applications that load libraries I was talking about earlier, but it's not obvious to me that's worth the complexity.
I would not bother. Once CL 49610 is submitted, Go stacks are the same as C programs compiled by Miscrosoft compilers. It is good enough.
Alex
I've been playing with @kjk's reproducer and I think this issue may actually be unrelated to stack size. Or at the very least, stack size is only one part of it.
The original hypothesis by @lxn (based on the error message fatal: morestack on g0
) was that the MSHTML library (invoked hackishly but cleverly through the syscall interface) required more stack space than Go provided by default. The issue only manifests on windows/amd64
, so that was a reasonable assumption. After some investigation into how to bump up the stack size while still retaining the syscall-only implementation, I found (see lxn/walk#261 for full discussion) that using a blank CGO import (_ "runtime/cgo"
) would bump up the stack size without requiring an external linker. This procedure did seem to solve the issue for windows/amd64
, so I assumed that stack size was the only issue.
When he was running his tests, @kjk initially tried importing runtime/cgo
as I suggested, but he abandoned this approach because Go tried to invoke gcc
to perform linking. Go defaults to using an external linker when CGO and a non-Go-compiled object are present, and in his case he was trying to link in a manifest compiled by the akavel/rsrc tool. For future reference (at least for testing with this reproducer), this behavior can be disabled by setting GO_EXTLINK_ENABLED=0
, in which case Go will use its internal linker (which works fine for this particular case). One can also simply deploy the manifest in text form alongside the binary (rename it to go20975.exe.manifest
) or embed it post-compilation with mt.exe
.
As a sensible alternative approach, @kjk tried to use editbin.exe
to manually bump up the stack size, but he still saw the same issues, even with absurdly large stack sizes. I also tried this approach with the exact same stack sizes that Go uses when CGO is present (the PE headers were almost identical), but still saw the same issue.
Now, to be fair, there are other changes (see https://golang.org/cl/49331) outside of just the PE headers necessary to truly support a larger stack size, but I have also tried Go 1.9 RC 2 (with CL 49331) and the issue is not solved by the larger stack size on windows/amd64
.
It occured to me that perhaps the other side effects of CGO might be responsible for the issue. Taking the reproducer, modifying it with a blank CGO import, and then compiling it (using either GO_EXTLINK_ENABLED=0
, a separate manifest, or a manifest embedded with mt.exe
- I tried all 3) always resulted in a stable binary where the morestack
issue never occurred.
I'm not a CGO expert, but my understanding is that the toolchain does a lot more than just bump up the stack size when CGO is present. Perhaps @minux or @alexbrainman can comment about what exactly happens when CGO is present on Windows. Is it possible that something like a different thread-local storage scheme only present with CGO is allowing this to work? I assume these sorts of issues would never come up in "normal" usage of the syscall interface on Windows since most of those functions wouldn't use TLS. This is of course just one suggestion. I assume that Internet Explorer (which is what the WebView
controller is hosting via COM/OLE) is probably doing some additional sandboxing on amd64
systems (and not 386
for whatever reason) - is it possible that this would only be compatible with CGO? Perhaps it's a calling convention issue?
It's still very possible that stack size is also a culprit here, but I'm starting to doubt it. Is there some way to simulate the effects of CGO on the resulting binary sans the increased stack size? If that is sufficient to solve the issue, then perhaps the larger default stack size on windows/amd64
can be reverted and the windows/386
behavior can be left as-is. Perhaps the solution would be to have the remainder of the CGO behavior always-on for Windows, though I'm not sure if this would cause significant performance degradation.
It's also worth noting, if you try the reproducer, that the JavaScript errors that pop up are a separate issue arising from the compatibility behavior of the WebView
component. These are totally unrelated to the morestack
issue and can be solved by changing the compatibility view settings in the registry (and simply hiding these JavaScript errors from the user - which is what IE does by default).
I think the key difference is that if you don't use cgo, then new threads are created by the function newosproc
in runtime/os_windows.go (https://tip.golang.org/src/runtime/os_windows.go#L618). That uses a fixed stack size that is not modified by editbin or anything else. When using cgo, new threads are created by the function _cgo_sys_thread_start
in runtime/cgo/gcc_windows_amd64.c (https://tip.golang.org/src/runtime/cgo/gcc_windows_amd64.c#L27) which simply calls _beginthread
and as such is affected by editbin.
@ianlancetaylor Would the code to set up TLS (https://tip.golang.org/src/runtime/cgo/gcc_windows_amd64.c#L53) be having some affect? I remember that there was quite a bit of discussion in #11058 about the complexities of TLS on Windows, but I don't know if that could be playing a role here. It might explain why simply setting the stack size doesn't seem to fix the problem (and that CGO does fix it).
@havoc-io Sure, TLS may be involved. I don't know.
But what I was trying to explain above is that when a program is built without cgo, simply setting the stack size does absolutely nothing. When a program is built with cgo, then setting the stack size does change the stack size. Or so it seems to me.
I might be wrong but _beginthread
, when given 0 as stack size as in gcc_windows_amd64.c
, uses the stack size provided in PE, so editbin would change it:
https://msdn.microsoft.com/en-us/library/kdzttdcb.aspx
The operating system handles the allocation of the stack when either _beginthread or _beginthreadex is called; you don't have to pass the address of the thread stack to either of these functions. In addition, the stack_size argument can be 0, in which case the operating system uses the same value as the stack that's specified for the main thread.
Also, the thread that crashes is main thread, which, I believe, is created by the OS.
Also, looking at the callstack of the crash it's pretty clear that the first bug to investigate and fix is the fact that Go runtime has infinite loop:
00 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
01 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
02 00000000`00439822 : 00000000`0044dac6 00000000`00439822 00000000`0044dac6 00000000`00439822 : go20975+0x4dac6
03 00000000`0044dac6 : 00000000`00439822 00000000`0044dac6 00000000`00439822 00000000`0044dac6 : go20975+0x39822
I haven't succeeded in getting symbols for the stack but it seems the crash happens because A calls B which calls A etc. and eventually that will run out of stack regardless of how much stack there is.
@ianlancetaylor Ah, so your point was that setting the stack size via editbin.exe
would have no effect on the stack size for threads created via newosproc
, which is the thread creation mechanism for the non-CGO case, so that perhaps playing with editbin.exe
to set stack size wasn't really doing anything when we hadn't compiled with CGO. Is newosproc
also used for creating the main thread, or would that thread's size be taken directly from the PE headers? I ask because the crash seems to be happening on the main thread (which is locked at program start in order to invoke GUI code). Also, what about threads created inside a DLL - would they be re-routed through newosproc
or is that mechanism only really used for creating new M
threads?
In any case, the stack size should be sufficiently large even in the non-CGO case in the Go 1.9 RCs (it's at least as big as the stacks for C/C++ binaries that use IE via COM/OLE and don't crash), but since the reproducer seems to crash with the Go 1.9 RCs, it definitely seems like there's something beyond stack size at play here.
@kjk You're right about the behavior of _beginthread
, but I think his point was that the code in gcc_windows_amd64.c
won't even be called in the non-CGO case. But I think you're right (and that's why I've asked) that the main thread will inherit the stack size from the PE headers since it's created by the OS, so that even in the non-CGO case, at least that thread should have a decent stack size if set by editbin.exe
, so it wouldn't make sense for it to be running out.
But again, in any case, Go 1.9 RCs should be giving every thread loads of stack, and something is still clearly broken.
After stepping through the code in the debugger, I don't think it's related to stack size.
The issue is that we have:
0:000> kb
# RetAddr : Args to Child : Call Site
00 00000000`0045104a : 00000000`004519ee 00000000`004304f0 00000000`00b9fef0 00000000`00000000 : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382]
01 00000000`004519ee : 00000000`004304f0 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 : go20975!runtime.exitsyscallfast.func1+0xaa [C:\Go\src\runtime\proc.go @ 2717]
02 00000000`004304f0 : 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 00000000`00455804 : go20975!runtime.systemstack+0x7e [C:\Go\src\runtime\asm_amd64.s @ 347]
03 00000000`00b9fef0 : 00000000`00000000 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 : go20975!runtime.mstart [C:\Go\src\runtime\proc.go @ 1125]
04 00000000`00000000 : 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 00000000`00b8be40 : 0xb9fef0
We are in:
TEXT runtime·morestack(SB),NOSPLIT,$0-0
// Cannot grow scheduler stack (m->g0).
get_tls(CX)
MOVQ g(CX), BX
MOVQ g_m(BX), BX
MOVQ m_g0(BX), SI
CMPQ g(CX), SI
JNE 3(PC)
CALL runtime·badmorestackg0(SB)
INT $3
The source of the problem, it seems, is that morestack
was called on scheduler stack i.e. g.m.g0 == g.
Best I can tell, this is called from exitsyscallfast:
if sched.pidle != 0 {
var ok bool
systemstack(func() {
ok = exitsyscallfast_pidle()
if ok && trace.enabled {
if oldp != nil {
// Wait till traceGoSysBlock event is emitted.
// This ensures consistency of the trace (the goroutine is started after it is blocked).
for oldp.syscalltick == _g_.m.syscalltick {
osyield()
}
}
traceGoSysExit(0)
}
})
if ok {
return true
}
}
return false
It seems like systemstack is supposed to ensure that the function called is not on scheduler stack but sometimes it fails.
Then the called function has unconditional call to morecallstack which triggers call to badmorestack0
which seems to be "should never happen" thing.
Unfortunately systemstack purposefully cuts off callstack, so I can't tell which function calls it but this is correlated with mshtml showing a modal dialog, so it most likely has to do with nested Go => C => Go transitions.
Narrowing it down further, it seems like it's always a result of calling back from C to Go:
00 00000000`00b975c0 : go20975!runtime.exitsyscallfast+0xcc [C:\Go\src\runtime\proc.go @ 2717]
01 00000000`00000028 : 0xb975c0
02 00000000`00450fa0 : 0x28
03 000000c0`42069357 : go20975!runtime.exitsyscallfast.func1 [C:\Go\src\runtime\proc.go @ 2717]
04 000000c0`42019300 : 0x000000c0`42069357
05 000000c0`4201e000 : 0x000000c0`42019300
06 000000c0`420693a8 : 0x000000c0`4201e000
07 00000000`00434777 : 0x000000c0`420693a8
08 00000003`00000002 : go20975!runtime.exitsyscall+0x67 [C:\Go\src\runtime\proc.go @ 2630]
09 000000c0`4201e000 : 0x00000003`00000002
0a 000000c0`4201e000 : 0x000000c0`4201e000
0b 000000c0`42019300 : 0x000000c0`4201e000
0c 000000c0`42069410 : 0x000000c0`42019300
0d 00000000`004026bc : 0x000000c0`42069410
0e 00000000`00000000 : go20975!runtime.cgocallbackg+0x7c [C:\Go\src\runtime\cgocall.go @ 184]
I captured it by using runtime.exitsyscallfast+0xcc "~. kb; g"
which , in windbg, breaks in exitsyscallfast right before calling systemstack
, prints callstack (kb
) and continues (g
).
systemstack
will then call lamda, which will call morestack
which will crash with cut off callstack.
The last callstack is the one which triggered systemstack
. It's still not the full callstack.
This seems to be limited to amd64. I spent a while trying to repro this on 386 build and couldn't.
Just to be clear - this debugging is on non-CGO binaries (i.e. those without a blank runtime/cgo
import)? I noticed the runtime.cgocallbackg
frame in your stack trace, but I guess maybe this CGO functionality is re-used in the Windows syscall implementation?
Yes, non-CGO, latest tests are with go 1.9 rc2
Is newosproc also used for creating the main thread, or would that thread's size be taken directly from the PE headers?
newosproc is not used to create main thread. main thread is created by Windows process loader. The loader uses pe file parameters to configure main thread stack.
Also, what about threads created inside a DLL - would they be re-routed through newosproc or is that mechanism only really used for creating new M threads?
If any non Go code calls Windows CreateThread API, the call cannot magically "be re-routed through newosproc". In fact this scenario is broken at this moment (see issue #6751).
I would not be surprised if your problem is a dup of #6751. Given how much external code you use, can you be certain that none of that code calls Go code on a thread that has not been created by Go?
Alex
I would not be surprised if your problem is a dup of #6751. Given how much external code you use, can you be certain that none of that code calls Go code on a thread that has not been created by Go?
The crash always happen on the main thread.
If you read earlier comments I think I narrowed it down to morestack
crashing because it detects we're on scheduler stack even though it's being called via systemstack()
whose purpose seems to be ensuring that doesn't happen.
So my best guess is that systemstack
on amd64 doesn't always do the job (the issue doesn't happen on i386 which is not unexpected because it's different assembly code).
Hi @kjk, I have tested your app.
test.manifest
is from https://github.com/lxn/walk#create-manifest-testmanifest .
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<assemblyIdentity version="1.0.0.0" processorArchitecture="*" name="SomeFunkyNameHere" type="win32"/>
<dependency>
<dependentAssembly>
<assemblyIdentity type="win32" name="Microsoft.Windows.Common-Controls" version="6.0.0.0" processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"/>
</dependentAssembly>
</dependency>
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
<dpiAware>true</dpiAware>
</asmv3:windowsSettings>
</asmv3:application>
</assembly>
$ go get -u github.com/lxn/win
$ go get -u github.com/lxn/walk
$ go get -u github.com/akavel/rsrc
$ rsrc -manifest test.manifest -o rsrc.syso
$ go build -ldflags="-H windowsgui"
$ ./testwalk.exe # It can run as expected, not be crashed!
My Windows OS is Windows 10 64 Bit
. Go version is as follow.
$ go version
go version go1.10 windows/amd6
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\xgf\AppData\Local\go-build
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=D:\gopath
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\xgf\AppData\Local\Temp\go-build028597650=/tmp/go-build -gno-record-gcc-switches
/all, FYI!
Updated:
@xgfone my website (https://blog.kowalczyk.info) is no longer a good test because I fixed a bug on my website generator that created mis-formatted, deeply nested html that triggered this problem easily.
Try https://vox.com or some other complicated website and make sure to click on links etc. It doesn't necessarily just crash on first render.
@kjk Yes, I have also tested https://vox.com.
It works after starting, and the program does not crash. Then I clicked and opened some links. When going on clicking a link, it crashed as follow.
# ...
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
Segmentation fault
The number of the buffer line of my console is 10000, and the output has exceeded the whole console buffer.
FYI.
If you read earlier comments I think I narrowed it down to morestack crashing because it detects we're on scheduler stack even though it's being called via systemstack() whose purpose seems to be ensuring that doesn't happen.
Based on your debugging, both systemstack
and morestack
appear to be doing what they're supposed to. systemstack
switches to the scheduler (aka system aka g0) stack, and morestack
detects that we ran out of space on the scheduler stack. systemstack
isn't supposed to ensure morestack
doesn't crash.
The question is why we ran out of space on the scheduler stack. Are you able to print the value of RSP and g0.stack.lo and g0.stack.hi at the crash from windbg?
When going on clicking a link, it crashed as follow.
fatal: morestack on g0
fatal: morestack on g0
I suspect I know why it's repeating until it crashes hard. After printing this message, it does a runtime.abort
, which raises an INT $3
, which I suspect is going into the runtime's exception handler, which eventually tries to call runtime.exceptionhandler
on the g0 stack. That detects that we're out of g0 stack space and aborts again. @alexbrainman, does this sound plausible? Is there a way we can abort the process without recursively invoking the exception handler? Or maybe runtime.sigtramp
should detect debug exceptions and return without calling runtime.exceptionhandler
?
Actually, the fact that it was able to print >10000 "morestack on g0" errors before segfaulting is telling. I assume the OS stack has a guard page, which means there must be a lot more space on the OS stack than Go thinks there is.
Oh, I'm pretty sure I know what's going on. While we allocate large stacks for both the main thread and new threads, both rt0_go
(in the non-cgo case) and tstart_stdcall
(only called in the non-cgo case) set the g0 stack bounds assuming the stack is only 64k. Hence, if C code calls back into Go code, it's entirely possible that we have plenty of stack space left, but we've gone past the 64k the runtime thinks it has.
(Edited to add: in the cgo case it looks like we get the stack bounds right for all of the threads. x_cgo_init
in runtime/cgo/gcc_windows_amd64.c
updates the bounds for the main thread and threadentry
updates them for new threads.)
@aclements That makes sense.
I'm not sure I'll be capable of testing unreleased Go version but if you have a fix, it should be easy to verify using https://github.com/kjk/go20975 test program and going e.g. to vox.com and clicking around.
It should crash 100% with current Go and not crash after the fix.
@kjk, thanks. It would be easy if I had a graphical Windows machine/VM. :)
@alexbrainman, can you think of an easy automated test for this? I think I need a C function in a DLL that uses lots of stack and then calls back into Go, but all in a non-cgo binary.
Change https://golang.org/cl/120195 mentions this issue: runtime: initialize g0 stack bounds on Windows to full stack
Change https://golang.org/cl/120336 mentions this issue: runtime: query thread stack size from OS on Windows
I suspect I know why it's repeating until it crashes hard. After printing this message, it does a runtime.abort, which raises an INT $3, which I suspect is going into the runtime's exception handler, which eventually tries to call runtime.exceptionhandler on the g0 stack. That detects that we're out of g0 stack space and aborts again. @alexbrainman, does this sound plausible?
It does sounds reasonable (you assume that runtime.morestack
is called on runtime.exceptionhandler
entry).
Is there a way we can abort the process without recursively invoking the exception handler?
I suppose we could call runtime.exit
instead of executing INT 3.
Or maybe runtime.sigtramp should detect debug exceptions and return without calling runtime.exceptionhandler?
We do handle debug exceptions (search for _EXCEPTION_BREAKPOINT), but it is too late for this scenario. So, sure, we could modify runtime.sigtramp
to handle this situation. Maybe we could mark runtime.exceptionhandler
or at least some part of it as "no to grow stack" instead?
I think I need a C function in a DLL that uses lots of stack and then calls back into Go, but all in a non-cgo binary.
We have quite a few tests in runtime package that builds C dlls on the fly and use them to test runtime code. See TestStdcallAndCDeclCallbacks, TestReturnAfterStackGrowInCallback, TestFloatArgs and TestDLLPreloadMitigation.
Alex