golang/go

cmd/go: -buildmode=c-shared should work on windows

chai2010 opened this issue ยท 216 comments

cmd/go: -buildmode=c-shared should work on windows

It would help to be more specific: different -buildmode options are, and always will be, supported on different systems. Which -buildmode options are you specifically interested in?

Probably, most of windows users are looking for a way to generate dlls.:

--buildmode=c-shared

or

--buildmode=dll

I only need -buildmode=c-shared, to generate DLL for other C/C++ user.
If Go can generate DLL, i can say byebye to C++.

PS: Also hope fix #9510 in Go1.5!

I would also like to see this happen. Is anyone currently working on this issue? Any CL's which may be reviewed?

I also need this. If I had some idea how much work was involved I might be willing to contribute an implementation. I know that Windows DLLs are fairly different from ELF style shared libraries, so if this is an enormous task it won't be something I can commit to.

Is there any way to find out how much/what kind of work would be involved?

@nadiasvertex The main thing is to make the function hostlink in cmd/link/internal/ld/lb.go do the right thing on Windows. On Darwin it invokes theC linker with -dynamiclib. On GNU/Linux it invokes the C linker with -Wl,-Bsymbolic -Wl,-z,relro -shared -Wl,-z,nodelete.

Basically, if you produce a list of commands that will turn a Windows object file into a Windows DLL, change hostlink to invokes those commands.

That said, it's possible that Windows needs to know the list of symbols that are callable from outside the DLL. It used to need to know that, but it's been many years since I looked at it. If you need that list of symbols, it's available by looping over the symbol table and looking for the Cgoexport field having a CgoExportStatic flag.

Great! I'll look at it. I suspect that the challenge is mostly in knowing
where the linker is on Windows.

On Mon, Dec 7, 2015 at 10:58 AM Ian Lance Taylor notifications@github.com
wrote:

@nadiasvertex https://github.com/nadiasvertex The main thing is to make
the function hostlink in cmd/link/internal/ld/lb.go do the right thing on
Windows. On Darwin it invokes theC linker with -dynamiclib. On GNU/Linux it
invokes the C linker with -Wl,-Bsymbolic -Wl,-z,relro -shared
-Wl,-z,nodelete.

Basically, if you produce a list of commands that will turn a Windows
object file into a Windows DLL, change hostlink to invokes those commands.

That said, it's possible that Windows needs to know the list of symbols
that are callable from outside the DLL. It used to need to know that, but
it's been many years since I looked at it. If you need that list of
symbols, it's available by looping over the symbol table and looking for
the Cgoexport field having a CgoExportStatic flag.

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

@ianlancetaylor I started looking into this a little more, and it turns out that I have some questions about some fundamental parts of the architecture. For example:

  1. I see that building go on Windows appears to require a gcc-ish compiler, like mingw. Consequently, I assume that doing C archive building or shared library building would require the same thing (instead of, say, MSVC.)
  2. So far I have enabled the commands for doing a go build with c-shared and I get "_rt0_amd64_windows_lib: not defined", which appears to be set in lib.go/libinit. I'm not entirely sure what to look for here, so some pointers would be helpful.

In case it is helpful, the current trace output looks like:

WORK=C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698
mkdir -p $WORK\command-line-arguments_obj
mkdir -p $WORK\command-line-arguments_obj\exe
cd Z:\projects\test\src\calc
CGO_LDFLAGS="-g" "-O2" "z:\projects\go\pkg\tool\windows_amd64\cgo.exe" -objdir "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" -importpath command-line-arguments "-exportheader=C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_install.h" -- -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" calc.go
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -print-libgcc-file-name
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_main.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_main.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_export.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_export.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\calc.cgo2.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\calc.cgo2.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_main.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_export.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\calc.cgo2.o" -g -O2
"z:\projects\go\pkg\tool\windows_amd64\cgo.exe" -objdir "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj" -dynpackage main -dynimport "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_.o" -dynout "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_import.go"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_all.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_export.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\calc.cgo2.o" -g -O2 -Wl,-r -nostdlib -Wl,--start-group -lmingwex -lmingw32 -Wl,--end-group C:/TDM-GCC-64/bin/../lib/gcc/x86_64-w64-mingw32/5.1.0/libgcc.a
"z:\projects\go\pkg\tool\windows_amd64\compile.exe" -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments.a" -trimpath "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698" -p main -buildid af4f2da53bc903b4d17a55032a8fca5f579d7452 -D /Z/projects/test/src/calc -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698" -pack "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_gotypes.go" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\calc.cgo1.go" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_cgo_import.go"
pack r "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments.a" "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj_all.o" # internal
cd .
"z:\projects\go\pkg\tool\windows_amd64\link.exe" -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build642482698\command-line-arguments_obj\exe\a.out.exe" -L "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698" -extld=gcc -buildmode=c-shared -buildid=af4f2da53bc903b4d17a55032a8fca5f579d7452 -v "C:\Users\CHRIST1\AppData\Local\Temp\go-build642482698\command-line-arguments.a"

command-line-arguments

HEADER = -H11 -T0x401000 -D0x0 -R0x1000
searching for runtime.a in $WORK/runtime.a
searching for runtime.a in z:\projects\go/pkg/windows_amd64/runtime.a
0.00 deadcode
0.01 pclntab=154410 bytes, funcdata total 33488 bytes
0.01 dodata
0.01 reloc
0.02 reloc
_rt0_amd64_windows_lib.ptr: _rt0_amd64_windows_lib: not defined
0.02 asmb
0.02 codeblk
0.03 datblk
0.03 sym
0.03 dwarf
0.03 headr
0.03 symsize = 0
0.03 symsize = 0
_rt0_amd64_windows_lib.ptr: undefined: _rt0_amd64_windows_lib

I don't think anybody we must require GCC, but clearly we do need some other toolchain. If somebody wants to add support for MSVC, I think that would be great. In any case, I think we do want to make it possible to suppose -buildmode=c-shared using GCC.

_rt0_amd64_windows_lib should be defined in runtime/rt0_windows_amd64.s, and should probably look something like _rt0_amd64_linux_lib in runtime/rt0_linux_amd64.s. I admit that I completely forgot about that part of this work. You'll have to tweak the argc/argv handling to be appropriate for Windows, and you'll have to implemented a Windows version of newosproc0.

You might also have to worry about how thread-local storage works in a DLL (I have no idea how this sort of thing works on windows)

@nadiasvertex The main thing is to make the function hostlink in cmd/link/internal/ld/lb.go do the right thing on Windows. On Darwin it invokes theC linker with -dynamiclib. On GNU/Linux it invokes the C linker with -Wl,-Bsymbolic -Wl,-z,relro -shared -Wl,-z,nodelete.

If you just want to build a DLL that does not contain any C code, you don't have to use external linker. I am sure you can modify Go linker to produce what you want. Go linker does just that when it creates windows executables. I don't see how creating of DLL would be different.

You will also have to deal with issues every windows DLL deals with. You must have set of mimimum functions required in DLL. You have to deal with your DLL exported functions called on different threads. You have to deal with exceptions.

You might also have to worry about how thread-local storage works in a DLL (I have no idea how this sort of thing works on windows)

What is wrong with the way thread-local storage works in Go windows executables now?

Alex

Let me be clear, I assume that go has already dealt with most of these
problems. I can certainly provide some glue for existing services. However,
if there is a need to write totally new runtime support I will need a lot
more direction.

On windows there were certain conventions for dll entry points. However,
those are generally C library artifacts. If go has particular needs I might
be able to satisfy those, given some pointers.

My personal need is to take go code, expose some of it via a C ABI, and
link an executable written in another language with it. The code needs to
work on Windows, Linux, and Mac. Android, iOS, and Windows Mobile are a
plus, but not urgently pending. Go satisfies most of these wonderfully. I
am interested in any solution that helps me accomplish this goal. I would
prefer a robust, integrated solution. However, if it is an unreasonably
large task for a new contributor then I would be happy to know about
workarounds.

On Mon, Dec 7, 2015, 7:27 PM Alex Brainman notifications@github.com wrote:

@nadiasvertex https://github.com/nadiasvertex The main thing is to make
the function hostlink in cmd/link/internal/ld/lb.go do the right thing on
Windows. On Darwin it invokes theC linker with -dynamiclib. On GNU/Linux it
invokes the C linker with -Wl,-Bsymbolic -Wl,-z,relro -shared
-Wl,-z,nodelete.

If you just want to build a DLL that does not contain any C code, you
don't have to use external linker. I am sure you can modify Go linker to
produce what you want. Go linker does just that when it creates windows
executables. I don't see how creating of DLL would be different.

You will also have to deal with issues every windows DLL deals with. You
must have set of mimimum functions required in DLL. You have to deal with
your DLL exported functions called on different threads. You have to deal
with exceptions.

You might also have to worry about how thread-local storage works in a DLL
(I have no idea how this sort of thing works on windows)

What is wrong with the way thread-local storage works in Go windows
executables now?

Alex

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

If you just want to build a DLL that does not contain any C code, you don't have to use external linker. I am sure you can modify Go linker to produce what you want. Go linker does just that when it creates windows executables. I don't see how creating of DLL would be different.

It's unusual to want to build a shared library/DLL if you don't have a C toolchain available. On Unix we decided to simply rely on that, rather than spend the time to teach the Go linker how to generate a shared library. On Windows, in which DLLs are more different from executables than they are on ELF, I would suggest following the same strategy. Since in the general case we must use external linking when generating a shared library, I don't think it's so bad to always require it.

On windows there were certain conventions for dll entry points. However,
those are generally C library artifacts. ...

It is not true. There are variety of non C compilers on Windows that will allow you to build a DLL.

My personal need is to take go code, expose some of it via a C ABI, and
link an executable written in another language with it.

You don't have to use DLLs for that. You can also include your Go code as part of your final executable. But then you need to be specific about the tools you use to build that executable. Is that going to be gcc? If yes, then what Ian suggests is your path. I do not know much about gcc, so I am not familiar with what is required.

You can also try and create DLL using gcc. Again, that is path that Ian suggested.

If you don't want to rely on gcc to build your programs, then you would have to build windows DLL as part of Go linkers. The code lives in $GOROOT/src/cmd/link/internal/ld/pe.go. Current code produces Windows PE executable (among other things). You can modify it to output Windows DLL (with whatever DLL requires). I am familiar with pe.go, but I have never built a DLL from scratch. But happy to help.

You should, probebly, try gcc approach first, because it has been implemented on some non-Windows OSes already. So, perhaps, it will be easy enough.

It's unusual to want to build a shared library/DLL if you don't have a C toolchain available. ...

Perhaps I misunderstand you, but I disagree. I don't see how building Windows DLL is different from building Windows executable. Surely we require gcc for cgo, but other than that.

On Unix we decided to simply rely on that, rather than spend the time to teach the Go linker how to generate a shared library.

Fair enough.

On Windows, in which DLLs are more different from executables than they are on ELF, I would suggest following the same strategy. Since in the general case we must use external linking when generating a shared library, I don't think it's so bad to always require it.

There are advantages of not requiring gcc on Windows. Go just works out of the box. When things break, you have all source code with you; and source code is Go. You can build Go Windows executable on any other OS - you can use Plan9 computer to build Go Windows executable.

Alex

@alexbrainman This issue is specifically about -buildmode=c-shared. The only reason to use -buildmode=c-shared is to build a DLL that can be linked into a program written in C. My thinking is that people doing that are probably also writing a program in C, and therefore have a C compiler. But I'm obviously not a Windows developer, so I may be wrong.

This issue is specifically about -buildmode=c-shared. The only reason to use -buildmode=c-shared is to build a DLL that can be linked into a program written in C. My thinking is that people doing that are probably also writing a program in C, and therefore have a C compiler. But I'm obviously not a Windows developer, so I may be wrong.

Fair enough. Similar I am not familiar with what -buildmode provides. What -buildmode flag should I use to build Windows DLL? DLL that can be called from any Windows executable or another DLL. These others (executables and DLLs) can be written in C, but don't have to be - they can be written in Go. We (Go executables we build) use system DLLs (produced by Microsoft) all the time. Do you see Go provide way to build DLLs just like these?

Alex

The intent is to write a plugin package that can be used to open Go shared libraries built with -buildmode=plugin. But this has not been implemented (see https://golang.org/s/execmodes for this and more about -buildmode).

On Windows, it may already work to use -buildmode=c-shared and open the DLL from a Go program. The disadvantage would be that you can only use functions with a C style interface, and you would have two different Go heaps and garbage collectors--one in the main program and one in the DLL. That is, -buildmode=c-shared is intended to give you a complete DLL that can be opened by a program written in any language, so it includes a complete copy of the Go runtime.

Thank you for explaining, Ian.

Alex

You might also have to worry about how thread-local storage works in a DLL (I have no idea how this sort of thing works on windows)

What is wrong with the way thread-local storage works in Go windows executables now?

Nothing, but it works by assuming g lives at a fixed offset from $fs, and when you are in a dynamic library that's going to be loaded into another process, you can't assume a fixed offset because something in that process or another dynamic library might already be using that offset. Or at least, that's the sort of problem you get on an ELF system -- like I said, I don't know how windows works here. But I suspect you'll need to change something in the area.

Yes. We would have to find different way to find TLS slot.

Alex

To provide some clarity on my motivation and goals:

I want to build a .dll because I don't know what code will eventually consume the pieces I am responsible for. In some cases a static library is enough, but in other cases it is not. For example, JNI and C#'s PInvoke require a .dll file. If it turns out that making a go .dll via -buildmode=c-shared is a larger project than I can take on, I hope to fall back to creating a static library via -buildmode=c-archive and then write some C functions that call the Go code, and are themselves DLL exports.

I understand that cgo already lets me register callbacks, and it certainly allows me to call into C code which may do just about anything. Consequently, I would imagine that the TLS problem has already been solved. I understand that a .dll adds some additional wrinkles to this situation, but I have to imagine that that is limited to the setup code.

With respect to @mwhudson, I don't understand what problem you are describing. If you are saying that the $fs segment register is used as a base, I'm not sure why you think what some other process is doing matters. The entire register set contents are isolated per process. The code itself lives where it lives, and is updated through the GOT and PLT tables at dll initialization. I confess that it has been a long time since I dealt with any of this in detail so I certainly may be missing something important here.

However, considering that this all works for non-Windows platforms, and considering that the Windows support for generating native go code is very good, I have to imagine that the vast majority of these issues are known and have been solved. I am very interested in specific guidance, and in known limitations of the go toolchain. I'm not terribly interested in generic 'thar be dragons" commentary because if this was really trivial it would be done already. :-)

I have gotten to the point where I am trying to implement newosproc0. Unfortunately, it seems like implementing this function requires deep understanding of Go's thread creation mechanics on Windows. Any pointers here? I looked at the Linux version but it calls functions which don't even exist for Windows.

I'm no Windows expert, but looking at newosproc in os1_windows.go I suspect that newosproc0 can simply be a copy of that. On Unix the difference between newosproc and newosproc0 is that newosproc0 is responsible for allocating the stack safely, but on Windows newosproc ignores the stk parameter anyhow.

I noticed that the c-shared and c-archive are pretty similar. I switched to c-archive mode for the initial implementation to reduce the complexity. When this works I'll move back to c-shared mode.

I created a small project:

//calc.go
package main

import "C"

//export Sum
func Sum(x, y int) int {
return x+y
}

func main() {

}

//test_driver.c
#include "test.h"

int main(int argc, char **argv) {
return Sum(5, 10);
}

On Linux this appears to work perfectly:

// commands
csnelson@nix-us2ua34705wc:/meps/projects/test/src/calc$ go build -buildmode=c-archive -o test.a calc.go
csnelson@nix-us2ua34705wc:
/meps/projects/test/src/calc$ gcc -c test_driver.c -o test_driver.o
csnelson@nix-us2ua34705wc:/meps/projects/test/src/calc$ gcc -o test test_driver.o test.a -lpthread
csnelson@nix-us2ua34705wc:
/meps/projects/test/src/calc$ ./test
csnelson@nix-us2ua34705wc:~/meps/projects/test/src/calc$ echo $?
15

On Windows I get through the Go part of the build without error, but I get undefined references from the C linker:

csnelson@nix-us2ua34705wc:/meps/projects/test/src/calc$ gcc -o test test_driver.o test.a
test.a(000000.o): In function Sum': C:/Users/CHRIST~1/AppData/Local/Temp/go-build344693770/command-line-arguments/_obj/_cgo_export.c:19: undefined reference to_cgoexp_f070ceaf1261_Sum'
C:/Users/CHRIST
1/AppData/Local/Temp/go-build344693770/command-line-arguments/_obj/_cgo_export.c:19: undefined reference to crosscall2' test.a(000000.o):calc.cgo2.c:(.rdata$.refptr._cgoexp_f070ceaf1261_Sum[.refptr._cgoexp_f070ceaf1261_Sum]+0x0): undefined reference to_cgoexp_f070ceaf1261_Sum'
collect2.exe: error: ld returned 1 exit status

The contents of the test.a archive:
$ ar -t test.a
go.o
000000.o
000001.o

$ nm test.a
000000.o:
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 d .data
00000000000000f2 N .debug_abbrev
0000000000000000 N .debug_abbrev
0000000000000000 N .debug_aranges
0000000000000030 N .debug_aranges
0000000000000000 N .debug_frame
00000000000002ba N .debug_info
0000000000000000 N .debug_info
0000000000000000 N .debug_line
00000000000000ad N .debug_line
0000000000000000 N .debug_loc
0000000000000000 p .pdata
0000000000000000 r .rdata$.refptr._cgoexp_f070ceaf1261_Sum
0000000000000000 r .rdata$zzz
0000000000000020 r .rdata$zzz
0000000000000000 R .refptr._cgoexp_f070ceaf1261_Sum
0000000000000000 t .text
0000000000000040 t .text
0000000000000000 r .xdata
U _cgo_wait_runtime_init_done
U _cgoexp_f070ceaf1261_Sum
U crosscall2
0000000000000000 T Sum

000001.o:
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 d .data
0000000000000000 d .data
0000000000000000 d .data
0000000000000000 d .data
0000000000000000 d .data
0000000000000130 N .debug_abbrev
0000000000000032 N .debug_abbrev
00000000000002ac N .debug_abbrev
0000000000000019 N .debug_abbrev
0000000000000000 N .debug_abbrev
0000000000000447 N .debug_abbrev
00000000000000d0 N .debug_aranges
00000000000000a0 N .debug_aranges
0000000000000070 N .debug_aranges
0000000000000020 N .debug_aranges
0000000000000000 N .debug_aranges
0000000000000040 N .debug_aranges
0000000000000000 N .debug_frame
0000000000000068 N .debug_frame
00000000000000f8 N .debug_frame
0000000000000a19 N .debug_info
000000000000018b N .debug_info
00000000000005ae N .debug_info
00000000000002e2 N .debug_info
0000000000000000 N .debug_info
0000000000000ea2 N .debug_info
0000000000000298 N .debug_line
00000000000000e9 N .debug_line
000000000000003a N .debug_line
000000000000001d N .debug_line
00000000000001bd N .debug_line
0000000000000000 N .debug_line
0000000000000000 N .debug_loc
0000000000000072 N .debug_loc
0000000000000211 N .debug_loc
000000000000000c N .debug_str
0000000000000000 N .debug_str
0000000000000048 p .pdata
0000000000000024 p .pdata
0000000000000000 p .pdata
0000000000000000 r .rdata
0000000000000060 r .rdata
0000000000000030 r .rdata
0000000000000080 r .rdata$zzz
0000000000000020 r .rdata$zzz
0000000000000040 r .rdata$zzz
0000000000000000 r .rdata$zzz
0000000000000060 r .rdata$zzz
0000000000000110 t .text
0000000000000050 t .text
0000000000000000 t .text
00000000000001f0 t .text
0000000000000000 t .text
0000000000000000 t .text
0000000000000000 r .xdata
0000000000000010 r .xdata
0000000000000028 r .xdata
U __imp___iob_func
U __imp__beginthread
U __imp__errno
00000000000001a0 T _cgo_sys_thread_start
0000000000000030 T _cgo_wait_runtime_init_done
U abort
00000000000001f0 T crosscall_amd64
U fprintf
U free
U fwrite
U malloc
0000000000000110 t threadentry
0000000000000090 T x_cgo_free
0000000000000180 T x_cgo_init
0000000000000050 T x_cgo_malloc
0000000000000040 T x_cgo_notify_runtime_init_done
0000000000000000 T x_cgo_sys_thread_create
00000000000000a0 T x_cgo_thread_start
nm: go.o: File format not recognized

@ianlancetaylor

The only reason to use -buildmode=c-shared is to build a DLL that can be linked into a program written in C. My thinking is that people doing that are probably also writing a program in C, and therefore have a C compiler. But I'm obviously not a Windows developer, so I may be wrong.

Here I dare disagree: on Windows, DLLs are very often used as a mechanism for providing plugins. In such a case, you start with some third-party app written by somebody else (quite often under proprietary license, so no source-code; but it's the same for open-source too, see e.g. Notepad++), which you want to extend with a plugin. If the app supports plugins, it would usually describe an interface that a plugin DLL must conform to, and how to register the DLL in the app (e.g. specific directory where you must put the DLL). So, no C toolchain in sight really; it's quite normal to write the DLL e.g. in Delphi or C#.

@nadiasvertex Something is wrong with the test.a file. The output you are showing looks as though -buildmode=c-archive was not specified at all. Use go build -x to see the commands that the Go command is executing. Make sure that -buildmode=c-archive is being passed to the linker. Use go build -ldflags=-v to pass -v to the linker; make sure it is doing the right thing to generate the archive, which I assume means invoking the ar command.

To provide some clarity on my motivation and goals:

What about building all your functionality as part of single Go executable and exporting it as RPC (via TCP or similar)?

why you think what some other process is doing matters

@mwhudson is talking about "therad local storage" here, not about processes view. Go runtime needs some block of memory that is "thread specific" - you can read and write that memory from any thread, and memory contents looks different if you look at it from different threads, but the same if you look from the same thread. As @minux mentioned, Windows provide some magic memory block (TIB), you can store a pointer at particular offset in this block, and that pointer will be different on different threads. Go runtime uses that slot for this particular purpose, but we also call some external code (system DLLs and cgo). Luckily for Go, no other code uses that slot. But if, for example, we create a Go DLL with its own runtime that uses the same slot, then calling this DLL from Go executable will be fatal.

How could you provide a C compatible interface from a Go DLL without cgo?

It realy depends on the tool you will use to consume your DLL. For gcc you will provide set of .obj, .lib and .h files. For Delphi you provide small Pascal source file. For C# you provide small C# file (I don't know much about C#, so I could be wrong here). For Go you can provide small Go file (similar to $GOPATH/src/syscall/zsyscall_windows.go).

You can document your interface just like Microsoft do. For example, https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx

I don't see any problem requiring gcc to build the DLL.

You loose all nice things I've mentioned above. IMHO it is too hard for average windows user. It means you exclude them from building DLLs.

Alex

To provide some clarity on my motivation and goals:

What about building all your functionality as part of single Go executable
and exporting it as RPC (via TCP or similar)?

That would be a hard option to sell. It's probably triple the work, and has
performance challenges. Especially when dealing with large string
transfers, which is the majority of the work this library would be doing.

why you think what some other process is doing

matters

But if, for example, we create a Go DLL with its own runtime that uses the
same slot, then calling this DLL from Go executable will be fatal.

I don't see how that can be true. The current go execution modes spec
requires one and only one copy of the go runtime per process. Unless I
misread it, all copies of the runtime must be merged when linked
(statically or dynamically.)

Unless I misread it, all copies of the runtime must be merged when linked (statically or dynamically.)

If there is only one copy of runtime, then surely it will coordinate itself properly. But I wouldn't worry about that yet. You need to build your DLL first.

Alex

@ianlancetaylor : here is the output from the build with verbose flags:

WORK=C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846
mkdir -p $WORK\command-line-arguments_obj
mkdir -p $WORK\command-line-arguments_obj\exe
cd Z:\projects\test\src\calc
CGO_LDFLAGS="-g" "-O2" "z:\projects\go\pkg\tool\windows_amd64\cgo.exe" -objdir "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" -importpath command-line-arguments "-exportheader=C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_install.h" -- -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" calc.go
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_main.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_main.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_export.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_export.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" -g -O2 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\calc.cgo2.o" -c "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\calc.cgo2.c"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_main.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_export.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\calc.cgo2.o" -g -O2
"z:\projects\go\pkg\tool\windows_amd64\cgo.exe" -objdir "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj" -dynpackage main -dynimport "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_.o" -dynout "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_import.go"
gcc -I "Z:\projects\test\src\calc" -m64 -mthreads -fmessage-length=0 -o "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_all.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_export.o" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\calc.cgo2.o" -g -O2 -Wl,-r -nostdlib -Wl,--start-group -lmingwex -lmingw32 -Wl,--end-group
"z:\projects\go\pkg\tool\windows_amd64\compile.exe" -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments.a" -trimpath "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846" -p main -buildid 5b68efefee46667e4a728bc7de39d8436fd9e03f -D /Z/projects/test/src/calc -I "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846" -pack "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_gotypes.go" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\calc.cgo1.go" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_cgo_import.go"
pack r "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments.a" "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj_all.o" # internal
cd .
"z:\projects\go\pkg\tool\windows_amd64\link.exe" -o "C:\Users\CHRIST
1\AppData\Local\Temp\go-build079329846\command-line-arguments_obj\exe\a.out.a" -L "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846" -extld=gcc -buildmode=c-archive -buildid=5b68efefee46667e4a728bc7de39d8436fd9e03f -v "C:\Users\CHRIST1\AppData\Local\Temp\go-build079329846\command-line-arguments.a"

command-line-arguments

HEADER = -H11 -T0x401000 -D0x0 -R0x1000
searching for runtime.a in $WORK/runtime.a
searching for runtime.a in z:\projects\go/pkg/windows_amd64/runtime.a
0.00 deadcode
0.03 pclntab=171685 bytes, funcdata total 25240 bytes
0.03 dodata
0.03 reloc
0.03 reloc
0.03 asmb
0.03 codeblk
0.05 datblk
0.05 sym
0.05 dwarf
0.05 headr
0.05 symsize = 0
0.05 symsize = 0
archive: ar -q -c -s $WORK\command-line-arguments_obj\exe\a.out.a C:\Users\CHRIST1\AppData\Local\Temp\go-link-523143211/go.o C:\Users\CHRIST1\AppData\Local\Temp\go-link-523143211/000000.o C:\Users\CHRIST~1\AppData\Local\Temp\go-link-523143211/000001.o
0.11 cpu time
25353 symbols
7280 liveness data
cp $WORK\command-line-arguments_obj_cgo_install.h test.h
cp $WORK\command-line-arguments_obj\exe\a.out.a test.a

That all looks correct. But I don't understand why your nm program can't recognize the format of the go.o file. You, or somebody, is going to have to dig into the linker to find out how it is creating the go.o file and figure out why nm doesn't understand it.

It turns out that the go.o file is getting created perfectly fine. If I pass -tmpdir to link.exe with a folder of my choice, the resulting archive is well-formed. In other words:

go build -x -work -buildmode=c-archive -ldflags="-v -tmpdir ./tmpo" -o test.a calc.go

Results in a perfectly valid test.a, which I can link against with:

gcc -o test test_driver.o test.a -lws2_32 -lntdll

I'm guessing that there is a race condition where the external linker is not finished with its job, but the temporary folder gets pulled out from under it.

This executable segfaults after running runtime.sdtcall1.

(gdb) run
Starting program: Z:\projects\test\src\calc\test.exe
[New Thread 3224.0x14c]
[New Thread 3224.0xb20]
[New Thread 3224.0x38c]

Breakpoint 3, 0x0000000000422760 in runtime.stdcall1 ()
2: x/3i $pc
=> 0x422760 <runtime.stdcall1>: sub $0x10,%rsp
0x422764 <runtime.stdcall1+4>: mov %gs:0x28,%rbx
0x42276d <runtime.stdcall1+13>: mov 0x0(%rbx),%rcx

(gdb) info registers
rax 0x2a 42
rbx 0x4c41d8 4997592
rcx 0x2 2
rdx 0x24fdf0 2424304
rsi 0x5 5
rdi 0xdb1540 14357824
rbp 0x4018c0 0x4018c0 main._cgoexpwrap_f070ceaf1261_Sum
rsp 0x24fc78 0x24fc78
r8 0x18 24
r9 0xdb15b0 14357936
r10 0x0 0
r11 0x286 646
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x422760 0x422760 runtime.stdcall1
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000000000042276d in runtime.stdcall1 ()
2: x/3i $pc
=> 0x42276d <runtime.stdcall1+13>: mov 0x0(%rbx),%rcx
0x422774 <runtime.stdcall1+20>: mov 0x30(%rcx),%rbx
0x422778 <runtime.stdcall1+24>: movq $0x1,0x328(%rbx)
(gdb) info registers
rax 0x2a 42
rbx 0x0 0
rcx 0x2 2
rdx 0x24fdf0 2424304
rsi 0x5 5
rdi 0xdb1540 14357824
rbp 0x4018c0 0x4018c0 main._cgoexpwrap_f070ceaf1261_Sum
rsp 0x24fc68 0x24fc68
r8 0x18 24
r9 0xdb15b0 14357936
r10 0x0 0
r11 0x286 646
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x42276d 0x42276d runtime.stdcall1+13
eflags 0x10202 [ IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0

After tracing this a little more, it turns out that I found this:

//gcc_libinit_windows.c
void
_cgo_wait_runtime_init_done() {
// TODO(spetrovic): implement this method.
}

I don't know if this is significant or not because it hasn't been implemented for openbsd or linux_ppc either. I can see the "default" implementation for Linux uses a pthread mutex. I could probably implement something like this for Windows/gcc. In fact, it probably wouldn't have to change much, if at all.

I copied the code verbatim and the program no longer segfaults. Instead it hangs forever inside _cgo_wait_runtime_init_done(), which probably indicates that x_cgo_notify_runtime_init_done() is never getting called.

Any pointers where to go from here?

x_cgo_notify_runtime_init_done is called under the name _cgo_notify_runtime_init_done by the function main in runtime/proc.go.

It's possible that the function main is not being called. That normally happens because the linker puts INITENTRY (the rt0 symbol) in the INITARRAY section (see addinitarrdata in cmd/link/internal/ld/data.go). However, although that works for ELF, it likely does nothing for PE. You need to figure out how to arrange for that symbol to be called at program startup time.

@ianlancetaylor I built a C++ file with a static constructor and it looks like the address of that function was put into a section called ".ctors" on Windows. Also https://gcc.gnu.org/onlinedocs/gccint/Initialization.html seems to indicate this is the correct place for this information.

The C++ .o file looks like this:

objdump -h
const.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000080  0000000000000000  0000000000000000  0000021c  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, DATA
  2 .bss          00000010  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC
  3 .text$_ZN4testC1Ev 00000010  0000000000000000  0000000000000000  0000029c  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, LINK_ONCE_DISCARD (COMDAT _ZN4testC1Ev 4)
  4 .xdata$_ZN4testC1Ev 00000008  0000000000000000  0000000000000000  000002ac  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_DISCARD
  5 .pdata$_ZN4testC1Ev 0000000c  0000000000000000  0000000000000000  000002b4  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA, LINK_ONCE_DISCARD
  6 .text$_ZN4testD1Ev 00000010  0000000000000000  0000000000000000  000002c0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, LINK_ONCE_DISCARD (COMDAT _ZN4testD1Ev 8)
  7 .xdata$_ZN4testD1Ev 00000008  0000000000000000  0000000000000000  000002d0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_DISCARD
  8 .pdata$_ZN4testD1Ev 0000000c  0000000000000000  0000000000000000  000002d8  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA, LINK_ONCE_DISCARD
  9 .xdata        00000024  0000000000000000  0000000000000000  000002e4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .pdata        00000024  0000000000000000  0000000000000000  00000308  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
 11 .ctors        00000008  0000000000000000  0000000000000000  0000032c  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
 12 .rdata$zzz    00000020  0000000000000000  0000000000000000  00000334  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

But the go.o file looks like this:

tmpo/go.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         000c1200  0000000000001000  0000000000001000  00000600  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE, DATA
  1 .data         00001200  00000000000c3000  00000000000c3000  000c1800  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss          0001f4c0  00000000000c5000  00000000000c5000  00000000  2**5
                  ALLOC

On Linux it looks like this:

tmpo/go.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00052650  0000000000000000  0000000000000000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .rodata       00040dd3  0000000000000000  0000000000000000  00053660  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  2 .typelink     000009f0  0000000000000000  0000000000000000  00094438  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  3 .gosymtab     00000000  0000000000000000  0000000000000000  00094e28  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  4 .gopclntab    00028a6b  0000000000000000  0000000000000000  00094e40  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .note.go.buildid 00000038  0000000000000000  0000000000000000  000bd8c0  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .noptrdata    000003b0  0000000000000000  0000000000000000  000be000  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  7 .init_array   00000008  0000000000000000  0000000000000000  000be3b0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  8 .data         000013b0  0000000000000000  0000000000000000  000be3c0  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  9 .bss          00023878  0000000000000000  0000000000000000  000bf780  2**5
                  ALLOC
 10 .noptrbss     00004b40  0000000000000000  0000000000000000  000e3000  2**5
                  ALLOC
 11 .tbss         00000008  0000000000000000  0000000000000000  000be000  2**3
                  ALLOC, THREAD_LOCAL
 12 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000000  2**0
                  CONTENTS, READONLY
 13 .debug_abbrev 000000ff  0000000000000000  0000000000000000  000d0bff  2**0
                  CONTENTS, READONLY, DEBUGGING
 14 .debug_line   0000a793  0000000000000000  0000000000000000  000d0cfe  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
 15 .debug_frame  0000a56c  0000000000000000  0000000000000000  000db491  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
 16 .debug_info   00027273  0000000000000000  0000000000000000  000e59fd  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
 17 .debug_pubnames 00009a03  0000000000000000  0000000000000000  0010cc70  2**0
                  CONTENTS, READONLY, DEBUGGING
 18 .debug_pubtypes 00004c30  0000000000000000  0000000000000000  00116673  2**0
                  CONTENTS, READONLY, DEBUGGING
 19 .debug_aranges 00000030  0000000000000000  0000000000000000  0011b2a3  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
 20 .debug_gdb_scripts 0000002a  0000000000000000  0000000000000000  0011b2d3  2**0
                  CONTENTS, READONLY, DEBUGGING

As an experiment I changed line 1334 in link/internal/ld/data.go to create a section called ".ctors" instead of ".init_array". However, the ".ctors" section never shows up. I added a Diag() call in data.go to make sure that the initarray was actually getting generated, and it is.

At this point, I am not really sure why the .ctor section is not appearing. I read through the doelf() function and the dope() function, but I'm clearly missing something.

I found Asmbpe() in link/internal/ld/pe.go. This actually creates PE sections for .text, .data, and .bss. I tried to add a section for ".ctors" but I'm not actually sure what the right thing to do is in the call.

Putting some hardcoded values in the call does result in a big, empty section called .ctors in the go.o file! So now I suppose I just need to write the contents of the initarray into this area. I'm not really sure how to get a hold of that data, and I'm not really sure how to size the section correctly.

I found Asmbpe() in link/internal/ld/pe.go. This actually creates PE sections for .text, .data, and .bss. I tried to add a section for ".ctors" but I'm not actually sure what the right thing to do is in the call.

Yes, you write pe sections in Asmbpe. You can look at, for example, addimports functions to see how it is done. Basically you write whatever contents you want in there and also call addpesection so your section reference is written in section table later. I have never heard about ".ctors" section, so I wouldn't know what to put there.

Putting some hardcoded values in the call does result in a big, empty section called .ctors in the go.o file! So now I suppose I just need to write the contents of the initarray into this area. I'm not really sure how to get a hold of that data, and I'm not really sure how to size the section correctly.

You use addpesection to specify the size of your section. It is your responsibility to actually write all the data to te file.

Alex

Perhaps someone can clarify some confusion for me.

In the file cmd/link/internal/ld/data.go there is a whole block of code which creates an section called .init_array, and later writes values into this section. I added some diagnostics which show that this code is called on Windows as well.

My question is, where does the section called ".init_array" go after it is created in data.go? A related question is: How do I get access to the the data which is written there?

Also, I found this very interesting discussion of .init_array vs. .ctors in gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770

On Fri, Dec 11, 2015 at 5:09 AM, Christopher Nelson <
notifications@github.com> wrote:

Perhaps someone can clarify some confusion for me.

In the file cmd/link/internal/ld/data.go there is a whole block of code
which creates an section called .init_array, and later writes values into
this section. I added some diagnostics which show that this code is called
on Windows as well.

My question is, where does the section called ".init_array" go after it is
created in data.go?

Well, on ELF, it gets written to the file in a section called
".init_array", and it gets put in the DT_INIT_ARRAY dynamic tag. I'm not
sure what happens on PE.

A related question is: How do I get access to the the data which is
written there?

That's kind of the wrong question. Either that approach is correct and it
should be written to the file, or it is not correct and we need to do
something else on Windows. One thing we should not do is write a
.init_array section and then read it and change it into something else. We
should the right thing initially.

Also, I found this very interesting discussion of .init_array vs. .ctors in

gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770

Yeah, but note that the context of the bug report is systems in which the
dynamic linker implements DT_INIT_ARRAY, an ELF-specific concept. If
Windows doesn't have something like .init_array we need to do something
different on Windows.

Ian

On Fri, Dec 11, 2015 at 8:40 AM Ian Lance Taylor notifications@github.com
wrote:

On Fri, Dec 11, 2015 at 5:09 AM, Christopher Nelson <
notifications@github.com> wrote:

Perhaps someone can clarify some confusion for me.

In the file cmd/link/internal/ld/data.go there is a whole block of code
which creates an section called .init_array, and later writes values into
this section. I added some diagnostics which show that this code is
called
on Windows as well.

My question is, where does the section called ".init_array" go after it
is
created in data.go?

Well, on ELF, it gets written to the file in a section called
".init_array", and it gets put in the DT_INIT_ARRAY dynamic tag. I'm not
sure what happens on PE.

Yes, I understand that. My confusion arises from the fact that I don't see
any platform-specific code in data.go. I understand that the data "should"
and somehow eventually does make it into the .init_array section on ELF. I
would like to perform the same step for PE on Windows, except redirect the
data to the .ctors section.

In other words, I see the code here:
https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L1333
This gets called on Windows and Linux, but I don't understand how that
actually gets picked up by the ELF writer. It certainly doesn't get picked
up by the PE writer.

I also see:
https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L1015
That also gets called on Windows and seems to generate the data that I want
to write into the .ctors section in the PE file for Windows. However, again
I don't actually understand "where" the data is going, or how to get back
to it in Asmbpe() so that I can write it into the .ctors section.

I've looked through the ELF writer, and I see that it adds a section name
called .init_array, but it is not clear to me how that actually gets
connected to the section with the same name made in data.go.

After running some experiments, it appears that gcc on Windows exclusively
uses the .ctors data section for this kind of thing. There doesn't appear
to be an .init_array concept, and frankly for static libraries I think this
is completely compiler runtime related. For gcc, the .ctors section seems
like it will work. For MSVC there may be some other mechanism, but that's
not really important to me at the moment. If anybody reading this knows
better I would absolutely like to hear it.

I made some simple C code, which looks like this:

$ cat const_2.c
void do_this_first() __attribute__((constructor));

void do_this_first() {
  for(int i=0; i<100; ++i) {}
}

Compiled on Linux it looks like this:

$ objdump -h const_2_u.o

const_2_u.o:     file format elf64-x86-64

Sections:

Idx Name          Size      VMA               LMA               File off Algn

  0 .text         0000001a  0000000000000000  0000000000000000  00000040 2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  1 .data         00000000  0000000000000000  0000000000000000  0000005a 2**0
                  CONTENTS, ALLOC, LOAD, DATA

  2 .bss          00000000  0000000000000000  0000000000000000  0000005a 2**0
                  ALLOC

  3 .init_array   00000008  0000000000000000  0000000000000000  00000060 2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA

  4 .comment      0000002e  0000000000000000  0000000000000000  00000068 2**0
                  CONTENTS, READONLY

  5 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000096 2**0
                  CONTENTS, READONLY

  6 .eh_frame     00000038  0000000000000000  0000000000000000  00000098 2**3

$ objdump -s const_2_u.o

const_2_u.o:     file format elf64-x86-64

Contents of section .text:

 0000 554889e5 c745fc00 000000eb 048345fc  UH...E........E.
 0010 01837dfc 637ef690 5dc3               ..}.c~..].

Contents of section .init_array:
 0000 00000000 00000000                    ........

Contents of section .comment:
 0000 00474343 3a202855 62756e74 7520352e  .GCC: (Ubuntu 5.
 0010 322e312d 32327562 756e7475 32292035  2.1-22ubuntu2) 5
 0020 2e322e31 20323031 35313031 3000      .2.1 20151010.

Contents of section .eh_frame:

 0000 14000000 00000000 017a5200 01781001  .........zR..x..
 0010 1b0c0708 90010000 1c000000 1c000000  ................
 0020 00000000 1a000000 00410e10 8602430d  .........A....C.
 0030 06550c07 08000000                    .U......

And on Windows it looks like this:

$ objdump -h const_2.o

const_2.o:     file format pe-x86-64

Sections:

Idx Name          Size      VMA               LMA               File off Algn

  0 .text         00000030  0000000000000000  0000000000000000  0000012c 2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  1 .data         00000000  0000000000000000  0000000000000000  00000000 2**4
                  ALLOC, LOAD, DATA

  2 .bss          00000000  0000000000000000  0000000000000000  00000000 2**4
                  ALLOC

  3 .xdata        0000000c  0000000000000000  0000000000000000  0000015c 2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

  4 .pdata        0000000c  0000000000000000  0000000000000000  00000168 2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

  5 .ctors        00000008  0000000000000000  0000000000000000  00000174 2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA

  6 .rdata$zzz    00000020  0000000000000000  0000000000000000  0000017c 2**4

$ objdump -s const_2.o

const_2.o:     file format pe-x86-64

Contents of section .text:

 0000 554889e5 4883ec10 c745fc00 000000eb  UH..H....E......
 0010 048345fc 01837dfc 637ef690 4883c410  ..E...}.c~..H...
 0020 5dc39090 90909090 90909090 90909090  ]...............

Contents of section .xdata:

 0000 01080305 08120403 01500000           .........P..

Contents of section .pdata:

 0000 00000000 22000000 00000000           ....".......

Contents of section .ctors:

 0000 00000000 00000000                    ........

Contents of section .rdata$zzz:

 0000 4743433a 20287464 6d36342d 31292035  GCC: (tdm64-1) 5
 0010 2e312e30 00000000 00000000 00000000  .1.0............

That may not be conclusive, but it strongly suggests to me that in this
fairly simple case, if I take the data that is in .init_array on ELF, and
write it to .ctors on PE, it will probably work.

In other words, I see the code here:
https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L1333
This gets called on Windows and Linux, but I don't understand how that
actually gets picked up by the ELF writer. It certainly doesn't get picked
up by the PE writer.

It is confusing.

There is naming confusion first. PE file has "sections" (search for pecoff.doc if you want details), while Go linker refers to "segments" that contains "sections". PE "section" corresponds to Go linker "segment". In terms of segments Go linker produces "text", "data" and list of "dwarf..." segments. These contains minimum code + data + debug_info that every platform need. All extras bits are written by platform specific code.

For example, by the time Asmbpe starts "text", "data" and "dwarf..." segments are already on the disk. All PE writing code do is makes a note (addpesection) about positions of those, so they can be written to "pe section table" as PE file is built. Asmbpe also writes whatever other PE sections this file requires - this time not just by calling addpesection but also by actually writing file contents. For example in addimports it writes a special PE section that is used by Windows program loader to find all system DLLs that Go program will use when it runs.

I suspect what happened here, is someone created new "segment" in Go linker without telling PE writer about it. So what you need to do either just add appropriate addpesection somewhere in Asmbpe. Or copy the writing code into pe.go, if you think it is more correct or clearer or whatever. When calling addpesection, you can name it ".ctors" or anything you want. I am not familiar with ".ctors", so I wouldn't know if it is required or if it will help you any.

Alex

@alexbrainman and @ianlancetaylor Thank you very much for all the help. I am now in a kind of frustrating place, so perhaps you can help me more.

I am pretty sure I know what exactly needs to be written into the PE file, at least for gcc:

  1. On Windows, gcc expects to find a list of function pointers starting with 0 and terminated with 0xffffffff in a section called .ctors. (https://gcc.gnu.org/onlinedocs/gccint/Initialization.html) It doesn't matter whether this is ELF or PE, it's the same thing.
  2. The linker will combine all of the contents of these lists in the .o files into an array called CTOR_LIST.
  3. The glibc runtime cooperates with the linker to generate a function called __do_global_ctors which will run through this list and call all of the functions on it.

The problem I have now is:

  1. I'm not sure how to find out the address of the _rt0_amd64_windows_lib function that should be written here.
  2. I'm not sure if I need to write a reloc entry somewhere in case the symbol gets relocated.
  3. I can't seem to figure out how to actually write binary data into the section.

I'm sorry to be a pain, but I feel like I'm really close to getting -buildmode=c-archive working, I just can't seem get the final pieces.

I found Cntxt.AllSyms, and I can find the symbol record for "_rt0_amd64_windows_lib", but the symbol record has a lot of data in it that doesn't obviously appear to be the address (or an address.)

I also saw the Vputl, Lputl functions. However, when I call them the result data doesn't appear to show up in the file.

Thanks again for all your help.

I found Cntxt.AllSyms, and I can find the symbol record for "_rt0_amd64_windows_lib", but the symbol record has a lot of data in it that doesn't obviously appear to be the address (or an address.)

I think Value contains address of the function (but I could be wrong). But that address is an "absolute" address. I don't think absolute address will work for you. Maybe you need offset from start of PE section containing that function. Maybe you need to provide relocation for these. I would fiddle with correspondent object file compiled by gcc to see what is required. I have built github.com/alexbrainman/goissue10776/pedump for similar purpose. It is, probably, not as good as objdump, but it is written in Go, and I can do with it what I like easily. It might also help you see how PE file is structured.

I can't seem to figure out how to actually write binary data into the section

See addimports as an example. Vputl and Lputl and many others certainly work. The IO is buffered, so you won't see them writing to the file immediately. I also suggest you check your position in the file - you might be writing somewhere in the middle of the file, instead of end. I am not sure what you're doing. Perhaps if you show us you changes, someone might be able to help you.

Alex

Thank you, that was very helpful. I now have data showing up in the .ctors section. I think it is not the correct data, but I'm getting closer.

I have forked the go repo and put my changes up. The commit with all the work so far is:
nadiasvertex@6a98514

The particular function that I am focusing on right now is:
https://github.com/nadiasvertex/go/blob/master/src/cmd/link/internal/ld/pe.go#L1089

According to objdump:

$ objdump -S ./tmpo/go.o | grep "_rt0"                                                                                             
000000000004f430 <_rt0_amd64_windows_lib>:

The address of the symbol I want is 0x4f430. This corresponds with what gcc does:

$ objdump -S const_2.o
0000000000000012 <do_this_first>:

$ objdump -s const_2.o
Contents of section .ctors:
 0000 12000000 00000000                    ........   

However, that's not what I'm actually getting in the data:

$ objdump -s -j .ctors tmpo/go.o 
Contents of section .ctors:
 e5000 30e40400 00000000 00000000 00000000  0...............

Or, if I don't subtract to Segtext.Vaddr:

Contents of section .ctors:
 e5000 30f44400 00000000 00000000 00000000  0.D.............

I also notice that the minimum section size I can create is 512 bytes here, but gcc appears to set the section size to the length of the actual written data. In this case, gcc makes .ctors 8 bytes long, but go makes it 512 bytes long. I wonder if that might be confusing the platform linker when it goes to merge the .ctors sections of the .o files at the end.

GCC:

$ objdump -h const_2.o

const_2.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000040  0000000000000000  0000000000000000  0000012c  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC
  3 .xdata        00000024  0000000000000000  0000000000000000  0000016c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .pdata        00000030  0000000000000000  0000000000000000  00000190  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .ctors        00000008  0000000000000000  0000000000000000  000001c0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  6 .rdata$zzz    00000020  0000000000000000  0000000000000000  000001c8  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Go:

$ objdump -h tmpo/go.o
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         000c1200  0000000000001000  0000000000001000  00000600  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE, DATA
  1 .data         00001200  00000000000c3000  00000000000c3000  000c1800  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss          0001f4c0  00000000000c5000  00000000000c5000  00000000  2**5
                  ALLOC
  3 .ctors        00000200  00000000000e5000  00000000000e5000  000c2a00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

I also notice that the minimum section size I can create is 512 bytes here, but gcc appears to set the section size to the length of the actual written data. In this case, gcc makes .ctors 8 bytes long, but go makes it 512 bytes long. I wonder if that might be confusing the platform linker when it goes to merge the .ctors sections of the .o files at the end.

Yes, that could break things for you.

Go PE linker used to generate executable files only. Windows executables must have sections padded to 512 bytes (maybe some other sizes work too), otherwise Windows program loader fails to execute them.

Recently minux implemented code to allow for external linker with cgo. That code produces object file (for gcc linker to use) instead of executable. The object file sections for the object file are padded same way, but there are only 2 sections present: .text and .data. gcc does not complain about that.

While trying to find solution for issue #10776 (see cl 13571), I have discovered that dwarf sections must not be padded, otherwise gcc complains. You can change your code not to pad .ctors section and see what happens. You can pinch some of my code from cl 13571. I think ALL sections in object file shouldn't be padded, but it is not easy to change - by the time PE linker knows if it is building executable or object file, alignment has already been set. We need to rearrange code for that, but I didn't have time to fiddle with that.

I also suggest you check if your C object file has any relocations. You might have to create them too.

Alex

I've made some more progress. I manually killed the padding on this one section after calling addpesection, and that results in the right values in the section header later.

It turns out that I need to emit a relocation entry for the .ctors section (which makes perfect sense.) I was unable to reuse the peemitreloc code because it wants to walk a whole symbol chain, and I just want to write one symbol.

In any case, I looked at the PE spec from MS, and I think I have most of the information I need. However, I don't seem to be able to get the correct SymbolTableIndex, which is "A zero-based index into the symbol table. This symbol gives the address that is to be used for the relocation. If the specified symbol has section storage class, then the symbolโ€™s address is the address with the
first section of the same name."

Any pointers here would be really helpful.

Also note that gcc appears to generate the relocation IN the .ctors section, AGAINST the .text section:

const_2.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.ctors]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       .text

Latest code is: nadiasvertex@f0894cb

If you have a *LSym value s, then the symbol index you are looking for is s.Dynid. But if that value is negative, then no symbol index has been assigned, and something has gone wrong.

A reloc against the .text section is really a reloc against a symbol named .text defined at offset 0 in the section named .text.

I don't seem to be able to get the correct SymbolTableIndex ...

Each PE section has relocation table. Each relocation record is:

type Reloc struct {
    VirtualAddress   uint32
    SymbolTableIndex uint32
    Type             uint16
}

VirtualAddress is offset in the PE section this relocation table belongs to where the data needs to be adjusted. Type determines how to adjust it. And SymbolTableIndex points to a "symbol" that contains all information external linker needs to produce relocation value. SymbolTableIndex is an index into PE "symbol table". You can read about it in pecoff.doc, but it lives somewhere after all PE sections are finished and is pointed by PointerToSymbolTable. You should be able to use objdump to look at PE symbol table of your C obj file and do something similar. You can look at addpesymtable in pe.go to see how we write symbol table. Perhaps Ian is correct that it points to the whole .text section, but you would have to write a correspondent entry into symbol table anyway.

Alex

A colleague helped me make additional progress. He added some logic that adjusts the existing relocation code to use the .ctors section. It now appears that we have the correct data in the correct place:

$ objdump -r -j .ctors tmpo\go.o 

tmpo\go.o:     file format pe-x86-64

RELOCATION RECORDS FOR [.ctors]:
OFFSET           TYPE              VALUE 
0000000000000000 R_X86_64_64       _rt0_amd64_windows_lib-0x000000000004f430

This looks slightly different than what gcc outputs because the value of the relocation is ".text". However, if I understand @ianlancetaylor then the value is just a name, and it's not important. This is exactly the symbol we want relocated, and the address is correct (0x4f430).

For some reason if I link the object files produced by Go into an .a file, the .ctors section doesn't get picked up. However, if I use gcc to link the 3 Go files with a C driver the .ctors data does get picked up:

$ gdb --silent  <dump_ctorlist.txt
(gdb) Reading symbols from test.exe...done.
(gdb) 
0x4cb5f0 <___CTOR_LIST__>:      0xffffffffffffffff      0x000000000089ee40
0x4cb600 <___CTOR_LIST__+16>:   0x00000000004cb5e0      0x0000000000000000
0x4cb610 <___DTOR_LIST__>:      0xffffffffffffffff      0x0000000000000000
0x4cb620:       Cannot access memory at address 0x4cb620

The fly in the ointment here is that the relocation happens incorrectly. The 0x000000000089ee40 is the entry that should call _rt0_amd64_windows_lib (0x00000000004cb5e0 = register_frame_ctor in section .text, part of gcc's init code).

However the address is wrong. GDB reports:

0x44fa10 <_rt0_amd64_windows_lib>

So I would expect that the entry in that list would be 0x44fa10.

I'm guessing that we specified the relocation wrong. Code is here: nadiasvertex@39b73bf

Any guidance would be helpful.

It may or may not help you, but: I've once also had to implement some relocations-related code, for building contents of a ".rsrc" section (embedded icons etc.) for Win32 for Go (the tool builds an .o/.syso file). Unfortunately, I don't remember what the calculations meant already (not sure if I even really understood them when writing the stuff), but in case it could help you in any way, the relevant code seems to be here:
https://github.com/akavel/rsrc/blob/ba14da1f827188454a4591717fff29999010887f/coff/coff.go#L386-L387

Quick explanation how the Walk block in l.371-393 works: it sequentially and recursively (DFS) walks through a Coff struct (which represents a "sketch"/template of a Coff output file), and feeds a string representing the "current path in the Coff struct" (e.g. "/Dir/Dirs[1]" means coff.Dir.Dirs[1]) to the closure, while "offset" contains the byte position that the field would have in the final output file (it advances by virtue of the freezeCommon2 call in l.392). The closure updates contents of various fields which in the final file must be filled based on this offset.

Not sure if that can or cannot be helpful for you; sorry if the latter, but just in case.

My colleague and I have gotten to the point that the _rt0_amd64_windows_lib initializer is called correctly from an executable linked to the Go generated static library. That's pretty exciting.

The problem now is that the executable segfaults. I'll briefly explain where I think it is happening, and maybe some guidance can be provided on what the "right" thing to do is.

First, I understand that we have to pass in command-line arguments somehow. On Windows this appears to be really complicated, so I'm not even dealing with it yet. By the time we get to the initialization routine, the registers which held the information from the Windows kernel have been overwritten.

So currently I am initializing the memory locations that we pull this stuff back out of to 0x0. Maybe this is bad, but it seems like the code in the runtime should just skip processing the args if I set them to empty.

The runtime passes through into x_cgo_sys_thread_create(), which receives _rt0_amd64_windows_lib_go as the function to spawn. pthread_create is given this value and returns success.

Walking out, we exit the __do_global_ctors() function fine. In the meantime, the initialized thread starts to run. This thread ends up segfaulting fairly quickly, by jumping to address 0x00000000ffffffff. Clearly this is wrong.

Are there any suggestions where I should look to narrow down why it's going off into the weeds?

First, I understand that we have to pass in command-line arguments somehow. On Windows this appears to be really complicated,

On Windows command-line arguments are handled by the kernel. Your executable don't need to do anything specific but to retrieve then when and if they are needed (use apropriate Windows APIs). Go runtime does is in goenvs function. This is all true for Go (pure and cgo) executable. I don't know what Go DLL does.

Are there any suggestions where I should look to narrow down why it's going off into the weeds?

I don't know what happens in Go DLL. Perhaps others will help.

Alex

@alexbrainman To be clear, this is not the .dll. This is a static library. Currently we can generate code that the C runtime can link against, and it runs the runtime initializer. The runtime initializer crashes somewhere, but there is not backtrace because it jumped to random code way up in the address space.

Still working on runtime initialization. I've adjusted the assembly to avoid trying to setup the command-line arguments. I now get a clean traceback. It looks like the init thread aborts before it ever gets to the runtime initialization part.

The addresses passed in to pthread_create look correct. When I change the code to branch directly to the runtime initializer (instead of spawning a thread) it seems to complete correctly. So it looks like it's isolated to bad setup of the thread.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1280.0x670]
0x000007fefd921520 in msvcrt!_HUGE () from C:\Windows\system32\msvcrt.dll
(gdb) bt
#0  0x000007fefd921520 in msvcrt!_HUGE () from C:\Windows\system32\msvcrt
#1  0x00000000004c7f57 in pthread_create_wrapper ()
#2  0x000007fefd89415f in srand () from C:\Windows\system32\msvcrt.dll
#3  0x000007fefd896ebd in msvcrt!_ftime64_s ()
   from C:\Windows\system32\msvcrt.dll
#4  0x0000000076fb5a4d in KERNEL32!BaseThreadInitThunk ()
   from C:\Windows\system32\kernel32.dll
#5  0x00000000770eb831 in ntdll!RtlUserThreadStart ()
   from C:\Windows\SYSTEM32\ntdll.dll

You may want to call some Windows specific function like _beginthread or _CreateThread rather than pthread_create.

I also want to warn you about using gdb. Go linker does not write dwarf information into object file (see issue #10776).

Alex

Turns out that the runtime code I borrowed from Linux uses a different argument-passing protocol than gcc on windows uses. After fixing that, everything works fine. My simple test runs to completion with no errors.

The changes are nadiasvertex@1fc1f59 and nadiasvertex@81e3aa8

My current build script is awkward because there are still two bugs left to figure out:

  1. Something deletes the temporary .o files before "ar" is finished with them (or possibly something isn't done writing them before 'ar' reads them.)
  2. For some reason 'ld' ignores the .ctor section when the .o is in an .a, but when it is a raw .o it works fine. In other words 'ld -o test driver.o test.a' fails but 'ld -o test driver.o tmpo/go.o tmp0/000001.o tmpo/000000.o' works perfectly fine.

I'm wondering if the linker is using any kind of asynchrony for (1), and for (2) I have no clue. Any ideas would be helpful.

I ran some tests, it looks like the second error above is actually fixed. Probably the bad data in the .o file was causing the linker to ignore whatever we wrote, and now that it's correct it works.

The first problem above is still a real issue. If I run

go build -buildmode=c-archive -ldflags="-tmpdir .\tmpo" -o test.a calc.go
gcc -o test test_driver.o test.a -lws2_32 -lntdll

The output is perfect, everything works, rainbows and unicorns.

If I run

go build -buildmode=c-archive  -o test.a calc.go
gcc -o test test_driver.o test.a -lws2_32 -lntdll

Then I get:

test.a(000000.o): In function `Sum':
C:/Users/CHRIST~1/AppData/Local/Temp/go-build304995930/command-line-arguments/_obj/_cgo_export.c:19: undefined reference to `_cgoexp_f070ceaf1261_Sum'
C:/Users/CHRIST~1/AppData/Local/Temp/go-build304995930/command-line-arguments/_obj/_cgo_export.c:19: undefined reference to `crosscall2'
test.a(000000.o):calc.cgo2.c:(.rdata$.refptr._cgoexp_f070ceaf1261_Sum[.refptr._cgoexp_f070ceaf1261_Sum]+0x0): undefined reference to `_cgoexp_f070ceaf1261_Sum'
collect2.exe: error: ld returned 1 exit status

Which is similar to a problem I described last week. The objdump output indicates that the go.o file has not apparently been completely consumed:

$ objdump -a test.a
In archive test.a:
objdump: go.o: File format not recognized

000000.o:     file format pe-x86-64
rw-rw-rw- 0/0   4165 Dec 19 13:53 2015 000000.o


000001.o:     file format pe-x86-64
rw-rw-rw- 0/0  16188 Dec 19 13:53 2015 000001.o

I have a workaround for problem (1) , the workaround is silly, but it seems pretty stable. Right before the spawn of 'ar' to create the C archive, I added a call to os.Stat() to see if go.o exists, and if so if it is empty or not. After I added this call the build worked perfectly every time.

Code is here: nadiasvertex@0df6140

At this point, all work for issue #13494 should be complete. Of course, it will need review and testing. I'll read the "how to contribute" stuff to submit a patch. As time permits I'll return to the c-shared bug and see what is needed for that.

I submitted a patch for the issue mentioned above, and started work on c-shared. The initial work of generating a .dll is done, but it has references to unexpected files. For example, when I run my test rig it says it cannot find a.out.exe. If I rename the output file "test.dll" to "a.out.exe" the test system runs and crashes. The dependency is probably just an artifact of how the linker generates output. If I use the final output name during the generation it will probably be okay.

The segfault occurs in runtime.rt0_go, here is the context:

   0x000000006ab8ba86 <+246>:   mov    %gs:0x28,%rbx
=> 0x000000006ab8ba8f <+255>:   movq   $0x123,%fs:(%rbx)
   0x000000006ab8ba97 <+263>:   mov    0x852da(%rip),%rax        # 0x6ac10d78 <runtime.m0+88>
   0x000000006ab8ba9e <+270>:   cmp    $0x123,%rax
   0x000000006ab8baa4 <+276>:   je     0x6ab8baad <runtime.rt0_go+285>

Registers are:

(gdb) info reg
rax            0x7a318c         8008076
rbx            0x6ac10d78       1791036792
rcx            0x6ac10a20       1791035936
rdx            0x0              0
rsi            0x6ab8d850       1790498896
rdi            0x6ac10d78       1791036792
rbp            0x0              0x0
rsp            0x99fed0         0x99fed0
r8             0x7fffffdb000    8796092870656
r9             0x62             98
r10            0x100000         1048576
r11            0x99fa98         10091160
r12            0x0              0
r13            0x0              0
r14            0x0              0
r15            0x0              0
rip            0x6ab8ba8f       0x6ab8ba8f <runtime.rt0_go+255>
eflags         0x10202          [ IF RF ]
cs             0x33             51
ss             0x2b             43
ds             0x0              0
es             0x0              0
fs             0x0              0
gs             0x0              0

I'm really not sure what the expected values are supposed to be here, or why they are wrong when the c-archive has them correct.

Code is here: nadiasvertex@4a4e022

That looks like the test in rt0_go that TLS has been set up correctly. I guess it hasn't, in fact, been set up correctly!

The question of what should be done about TLS is probably above my pay grade, as it seems fundamental to the ABI of the language. Is there a discussion about options somewhere? If not, can I facilitate such a conversation in some way? It doesn't seem to make much sense for me to continue working on this until there is some consensus around what ought to be done.

@nadiasvertex I don't see how it is possible for it to crash in runtime.rt0_go. Can I see it for myself? How do you make it crash?

But yes we would have to change the way we do TLS, if we want to have Go DLL loaded by Go executable. From what I can gather on the net. We can use PE file facilities (search for .tls in pecoff.doc), or we can use TlsGetValue Windows API. Given that you use gcc to generate DLL, PE file fiddling is not an option, unless someone knows how to arrange for gcc to help with that. That leaves us with TlsGetValue. I am concerned that TlsGetValue call might be expensive comparing with what we do now, but I don't know any alternative. Maybe others will comment too.

Alex

https://github.com/nadiasvertex/go/tree/win-shared has the code. Just build a .dll and call a Go function from a C function.

@nadiasvertex I tried your win-shared branch.

package main

//export Foo
func Foo() {
    println("foo")
}

func main() {
}

and build

go build --buildmode=c-shared -o go.dll
nm -s go.dll

https://gist.github.com/mattn/ad8ed59a3efe86db2278

It seems not exported Foo.

I run make.bat over https://github.com/nadiasvertex/go/tree/win-shared, but it fails with:

runtime/cgo
# runtime/cgo
runtime\cgo\gcc_libinit_windows.c:9:21: fatal error: pthread.h: No such file or directory
 #include <pthread.h>
                     ^
compilation terminated.

How do I fix this failure?

My gcc:

c:\dev\winshared\src>gcc --version
gcc (GCC) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Alex

I used the TDM GCC distribution at http://tdm-gcc.tdragon.net and it just
works. If you can't include pthread, you probably don't have it installed.
If you're using mingw you may need to make sure that package is installed.

On Tue, Dec 22, 2015, 10:12 PM Alex Brainman notifications@github.com
wrote:

I run make.bat over https://github.com/nadiasvertex/go/tree/win-shared,
but it fails with:

runtime/cgo

runtime/cgo

runtime\cgo\gcc_libinit_windows.c:9:21: fatal error: pthread.h: No such file or directory
#include <pthread.h>
^
compilation terminated.

How do I fix this failure?

My gcc:

c:\dev\winshared\src>gcc --version
gcc (GCC) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Alex

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

It should be possible to write runtime/cgo/gcc_libinit_windows.c without using pthread at all. It can (and should) use ordinary Windows calls instead.

What do you mean by "ordinary" Windows calls? The msvcrt, or direct win32
API?

On Wed, Dec 23, 2015 at 1:12 PM Ian Lance Taylor notifications@github.com
wrote:

It should be possible to write runtime/cgo/gcc_libinit_windows.c without
using pthread at all. It can (and should) use ordinary Windows calls
instead.

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

_cgo_sys_thread_start in $GOROOT/src/runtime/cgo/gcc_windows_amd64.c uses _beginthread. Maybe do the same.

You can also use Windows CreateThread function. I don't know what is the right thing to do here, because I don't know much about gcc.

Alex

This isn't really a GCC issue. You can disregard the fact that gcc_libinit_windows.c has a name that starts with "gcc". The point is, the file contains C code, and is compiled with a C compiler. It defines three functions

x_cgo_sys_thread_create must start a new thread running func, passing arg to it. This is the win32 CreateThread function or the Visual Studio _beginthread function.

x_cgo_wait_runtime_init_done must wait until x_cgo_notify_runtime_init_done has been called. x_cgo_notify_runtine_init_done must let x_cgo_wait_runtine_init_done execute. I don't know much about Windows synchronization, but I would guess this could be done using the win32 CreateSemaphore and WaitForSingleObject functions.

I will make these adjustments to this code and the c-archive patch.

I have removed the pthread-specific code and pushed it into both the c-archive patch in the official tree, and into the win-shared branch of my github fork of Go. If you would like to take another look at the error, you may get farther. Thanks!

Changes are here: nadiasvertex@d88e7e0
Code is here: https://github.com/nadiasvertex/go/tree/win-shared

Running make.bat still fails:

...
runtime/cgo
# runtime/cgo
runtime\cgo\gcc_libinit_windows.c:9:21: fatal error: pthread.h: No such file or directory
 #include <pthread.h>
                     ^
compilation terminated.
runtime/race
testing/iotest
...

I hope I am running correct version:

c:\dev\winshared\src>git rev-parse HEAD
d88e7e06bea1c7f0a33f87f6cddbdbe149a80619

c:\dev\winshared\src>git status
On branch win-shared
Your branch is up-to-date with 'origin/win-shared'.

nothing to commit, working directory clean

c:\dev\winshared\src>

Alex

I made a typo when I migrated the patch from my c-archive branch. I updated gcc_libinit.c instead of gcc_libinit_windows.c. The code has been updated, and I verified that I patched the right file this time.

The code has been updated

I can build Go now. Thank you.

Just build a .dll and call a Go function from a C function.

I don't know how to do that. I used

go build --buildmode=c-shared -o go.dll

command and that creates a go.dll file. How do you call it from a C function? What are the commands? Thank you.

Alex

I've tried to use TlsGetValue

alexbrainman/winapi@a05e0a7

and it works fine. TlsGetValue

(gdb) disas
Dump of assembler code for function TlsGetValue:
=> 0x7c8097e0 <+0>:     mov    %edi,%edi
   0x7c8097e2 <+2>:     push   %ebp
   0x7c8097e3 <+3>:     mov    %esp,%ebp
   0x7c8097e5 <+5>:     mov    %fs:0x18,%eax
   0x7c8097eb <+11>:    mov    0x8(%ebp),%ecx
   0x7c8097ee <+14>:    cmp    $0x40,%ecx
   0x7c8097f1 <+17>:    jae    0x7c845054 <SetUnhandledExceptionFilter+367>
   0x7c8097f7 <+23>:    andl   $0x0,0x34(%eax)
   0x7c8097fb <+27>:    mov    0xe10(%eax,%ecx,4),%eax
   0x7c809802 <+34>:    pop    %ebp
   0x7c809803 <+35>:    ret    $0x4
End of assembler dump.
(gdb)

looks fairly small, so it is, probably, an option for us (I didn't check amd64 version). I also have found this article https://msdn.microsoft.com/en-us/library/windows/desktop/ms686997(v=vs.85).aspx about how to do the same in a DLL, so we can use that too.

Is that something we want to do? Perhaps there is a way to handle TLS via gcc, maybe it will be simpler.

Alex

PS: We used to have get_tls asm macro defined in runtime package, but now it just say

#define get_tls(r)      MOVL TLS, r

How is MOVL TLS, r implemented?

mbk commented

OT: I really hope you succeed for 1.6 and admire your persistence in tracking down all the little issues m

@alexbrainman

I have a "driver" .c file that looks like this:

#include "test.h"
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
  for(int i=0; i<100; i++) {
     for(int j=0; j<100; j++) {
         GoInt v1 = Sum(i,j);
         GoInt v2 = i+j;

         printf("Sum=%lld expect %lld\n", v1, v2);

         if (v1!=v2) {
            abort();
         }
     }
  }
  return 0;
}

I have a test .go file that looks like this:

package main

import "C"


//export Sum
func Sum(x, y int) int {
        return x+y
}

func main() {

}

I build using the following commands:

go build -buildmode=c-shared -o test.dll test.go
gcc -c -o test_driver.o test_driver.c
gcc -o test-lib test_driver.o test.dll -lws2_32 -lntdll
copy test.dll a.out.exe

That's when I experience the TLS issue.

And where I am suppose to get test.h file from?

Alex

It is generated by cgo.

On Sun, Jan 3, 2016, 7:59 PM Alex Brainman notifications@github.com wrote:

And where I am suppose to get test.h file from?

Alex

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

I had to modify your C program sligtly (my gcc cannot handle for() loops), and everything builds now:

c:\dev\src\issues\issue11058>dir
 Volume in drive C has no label.
 Volume Serial Number is D2A1-D2A1

 Directory of c:\dev\src\issues\issue11058

04/01/2016  12:50 PM    <DIR>          .
04/01/2016  12:50 PM    <DIR>          ..
04/01/2016  12:45 PM    <DIR>          .hg
04/01/2016  12:45 PM               115 test.go
04/01/2016  12:47 PM               246 test_driver.c
               2 File(s)            361 bytes
               3 Dir(s)   3,758,317,568 bytes free

c:\dev\src\issues\issue11058>type test.go
package main

import "C"


//export Sum
func Sum(x, y int) int {
        return x+y
}

func main() {

}
c:\dev\src\issues\issue11058>type test_driver.c
#include "test.h"
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
        int i=2, j=3;
        GoInt v1 = Sum(i,j);
        GoInt v2 = i+j;

        printf("Sum=%lld expect %lld\n", v1, v2);

        if (v1!=v2) {
        abort();
        }
        return 0;
}
c:\dev\src\issues\issue11058>go build -buildmode=c-shared -o test.dll test.go

c:\dev\src\issues\issue11058>gcc -c -o test_driver.o test_driver.c

c:\dev\src\issues\issue11058>gcc -o test-lib test_driver.o test.dll -lws2_32 -lntdll

c:\dev\src\issues\issue11058>

But when I run test-lib.exe, I get this error https://gist.github.com/alexbrainman/72057484f5a42821ca86#file-cgoerror-jpg.

Alex

The go linker generates the .dll with the name a.out. then renames it. This
doesn't work on windows. I will fix that, but for now you have to also copy
test.dll to a.out. I forgot to mention that in the instructions.

On Sun, Jan 3, 2016, 9:30 PM Alex Brainman notifications@github.com wrote:

I had to modify your C program sligtly (my gcc cannot handle for() loops),
and everything builds now:

c:\dev\src\issues\issue11058>dir
Volume in drive C has no label.
Volume Serial Number is D2A1-D2A1

Directory of c:\dev\src\issues\issue11058

04/01/2016 12:50 PM

.
04/01/2016 12:50 PM ..
04/01/2016 12:45 PM .hg
04/01/2016 12:45 PM 115 test.go
04/01/2016 12:47 PM 246 test_driver.c
2 File(s) 361 bytes
3 Dir(s) 3,758,317,568 bytes free

c:\dev\src\issues\issue11058>type test.go
package main

import "C"

//export Sum
func Sum(x, y int) int {
return x+y
}

func main() {

}
c:\dev\src\issues\issue11058>type test_driver.c
#include "test.h"
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
int i=2, j=3;
GoInt v1 = Sum(i,j);
GoInt v2 = i+j;

    printf("Sum=%lld expect %lld\n", v1, v2);

    if (v1!=v2) {
    abort();
    }
    return 0;

}
c:\dev\src\issues\issue11058>go build -buildmode=c-shared -o test.dll test.go

c:\dev\src\issues\issue11058>gcc -c -o test_driver.o test_driver.c

c:\dev\src\issues\issue11058>gcc -o test-lib test_driver.o test.dll -lws2_32 -lntdll

c:\dev\src\issues\issue11058>

But when I run test-lib.exe, I get this error
https://gist.github.com/alexbrainman/72057484f5a42821ca86#file-cgoerror-jpg
.

Alex

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

you have to also copy
test.dll to a.out.

copy test.dll a.out.exe does the trick. Thank you. I need to debug your original problem now.

Alex

@nadiasvertex I can reproduce the crash in runtime.rt0_go. It is crashing on https://github.com/nadiasvertex/go/blob/master/src/runtime/asm_amd64.s#L112 line. I have used objdump to disassemble test.dll. Here is the interesting part:

    6dc8bae3:   e8 18 38 00 00          callq  6dc8f300 <runtime.settls>
    6dc8bae8:   65 48 8b 1c 25 28 00    mov    %gs:0x28,%rbx
    6dc8baef:   00 00 
    6dc8baf1:   64 48 c7 03 23 01 00    movq   $0x123,%fs:(%rbx)
    6dc8baf8:   00 
    6dc8baf9:   48 8b 05 78 e2 07 00    mov    0x7e278(%rip),%rax        # 6dd09d78 <runtime.m0+0x58>

If you compare it to a similar part of simple Go executable (my go.exe):

  4e2551:   e8 ea 3c 00 00          callq  4e6240 <runtime.settls>
  4e2556:   65 48 8b 1c 25 28 00    mov    %gs:0x28,%rbx
  4e255d:   00 00 
  4e255f:   48 c7 83 00 00 00 00    movq   $0x123,0x0(%rbx)
  4e2566:   23 01 00 00 
  4e256a:   48 8b 05 27 9f 79 00    mov    0x799f27(%rip),%rax        # c7c498 <runtime.m0+0x58>

you will see that g(BX) is transldated into 0x0(%rbx), while your new test.dll has %fs:(%rbx). I suspect that translation is your problem. I don't know how g(BX) gets converted, perhaps others will help.

Mind you, I am not sure if you should spend your time fixing this. Because, even if you will make existing code work, you would have to come up with a different way to do TLS inside of your DLL (see discussion before).

Alex

That general kind of transformation occurs in cmd/internal/obj/x86/asm6.go. I have to admit that I have no idea how that specific transformation is occurring in this case.

Thank you @ianlancetaylor for suggestion. I will let @nadiasvertex decide what to do here.

Alex

Is there any workaround for this at the moment? Perhaps a manual approach that let's Go generate the object files that I can link together manually with gcc?

@nadiasvertex Is go.o ever explicitly closed after it's written to? I'm wondering if that could be why you need to do the os.Stat workaround to get it to flush to disk.

@jtsylve There is currently no way to generate a functional .dll straight from a Go
package on Windows. However, the -buildmode=c-archive patch for Windows is
working through the acceptance process. The patch works for me in the
limited testing I have done with it. You could download the patch and apply
it to the compiler source, then generate a static library with it. You
could then write a .c file which exposes .dll entry points and calls into
the static library, and generate a .dll from that .c file and the .a file.

It would be useful to see someone test it in a larger scope, so if you
would like to try this out I would be happy to help you get it working.

On Thu, Jan 7, 2016 at 12:51 PM Joe Sylve notifications@github.com wrote:

@nadiasvertex https://github.com/nadiasvertex Is go.o ever explicitly
closed after it's written to? I'm wondering if that could be why you need
to do the os.Stat workaround to get it to flush to disk.

โ€”
Reply to this email directly or view it on GitHub
#11058 (comment).

@alexbrainman I'm not sure what I am deciding. The shared library support is pretty important in my environment, so deciding not to do it isn't really much of an option for me.

If you are asking me to track down the compiler issue and figure out why it is misbehaving, that is fine. As long as I can ask questions of people who might know the answers. :-) Also, I have a heavy workload right now, so I might move slowly.

I'm not sure what I am deciding. ...

You need to fix crash in runtime.rt0_go. You can fix the crash by changing some code in cmd/internal/obj/x86/asm6.go to generate same code in both executable and DLL version of runtime.rt0_go. But that is not complete solution. If you reread earlier discussion about TLS, we all agreed that Go DLL cannot use same approach for TLS as current Go executable does. If Go DLL is loaded by Go executable they will be stepping on each other toes. Perhaps it is OK just to get first Go DLL going, but ultimately we'll need to do something different here. There is bigger fish to fry here.

As long as I can ask questions of people who might know the answers.

Everyone is happy to help you with what they know.

I might move slowly.

Noone else wants to do it at this moment. It is completely up to you.

Alex

@nadiasvertex I was able to build and use win-archive to produce a static library, but couldn't get the win-shared to build. The build freezes as shown below.

C:\GoP2\src>make.bat
##### Building Go bootstrap tool.
cmd/dist


Any ideas?

Yes. The win-shared code is broken because the compiler is not emitting the right TLS primitives. This causes the deadlock you are experiencing. Exactly what the right primitives are is a matter I have to investigate.