golang/go

cmd/link: Apple's symbols tool unable to read DWARF data from c-archive go.o

Closed this issue · 21 comments

tmm1 commented

Apple's /usr/bin/symbols is unable to parse DWARF data embedded by the golang compiler, making iOS app using gomobile hard to debug because their crash reports are missing symbols.

package main

import "C"
import "log"

func main() {
    log.Printf("foo")
}
$ go version
go version devel +3b6f9a0b9b Fri Apr 12 17:45:10 2019 -0700 darwin/amd64

$ CGO_CFLAGS=-g go build -buildmode=c-archive -o test.a test.go
$ ar xv test.a
x - go.o
x - 000000.o

$ symbols go.o
go.o [x86_64, 0.012018 seconds]:
    null-uuid                            go.o [OBJECT, Empty]
        0x0000000000000000 (0x3332de)  SEGMENT

$ symbols 000000.o
000000.o [x86_64, 0.000155 seconds]:
    null-uuid                            000000.o [OBJECT, FaultedFromDisk]
        0x0000000000000000 (    0x98)  SEGMENT
            0x0000000000000000 (    0x24) __DWARF __apple_names
            0x0000000000000024 (    0x24) __DWARF __apple_objc
            0x0000000000000048 (    0x24) __DWARF __apple_namespac
            0x000000000000006c (    0x2c) __DWARF __apple_types

$ symbols -v
symbols version:			@(#)PROGRAM:symbols  PROJECT:SamplingTools-64460.8
CoreSymbolicationDT.framework version:	64460.7

I filed a bug report with Apple about this, and they verified an issue in symbols:

Thanks for bringing this to our attention. Our Developer Tools team has analyzed your object files, in particular the one with DWARF, and identified potential improvements for how DWARF data is extracted from an object file by the symbols tool. Specifically, the tool is currently short-circuiting because it encountered a DW_TAG_inlined_subroutine abbreviation table that includes a pair of DW_AT_call_line, DW_FORM_udata. Using DW_FORM_udata is valid DWARF, so handling this better is something the team may consider for the future. We can’t guarantee if or when this may happen.

One change for you to try and make this better right now — could you try using DW_FORM_data1 or DW_FORM_data2 in place of DW_FORM_udata? If you do so, does your interaction with the symbols data improve to a workable level?

Based on the above, I tried this patch which makes symbols go.o work as expected once applied:

diff --git a/src/cmd/internal/dwarf/dwarf.go b/src/cmd/internal/dwarf/dwarf.go
index df80039063..b0349a15bc 100644
--- a/src/cmd/internal/dwarf/dwarf.go
+++ b/src/cmd/internal/dwarf/dwarf.go
@@ -434,7 +434,7 @@ var abbrevs = [DW_NABRV]dwAbbrev{
                        {DW_AT_low_pc, DW_FORM_addr},
                        {DW_AT_high_pc, DW_FORM_addr},
                        {DW_AT_call_file, DW_FORM_data4},
-                       {DW_AT_call_line, DW_FORM_udata},
+                       //{DW_AT_call_line, DW_FORM_udata},
                },
        },

@@ -446,7 +446,7 @@ var abbrevs = [DW_NABRV]dwAbbrev{
                        {DW_AT_abstract_origin, DW_FORM_ref_addr},
                        {DW_AT_ranges, DW_FORM_sec_offset},
                        {DW_AT_call_file, DW_FORM_data4},
-                       {DW_AT_call_line, DW_FORM_udata},
+                       //{DW_AT_call_line, DW_FORM_udata},
                },
        },

I also tried replacing FORM_udata with FORM_data1 and FORM_data2, but that did not fix symbols and also caused dwarfdump to report errors in the generated go.o.

@aclements Can you offer any advice here? (If you're testing locally, note that you will need 3cb92fc for symbols go.o to work at all.)

cc #31022 #28997 @eliasnaur

Related, but I don't think it is your bug: https://go-review.googlesource.com/c/go/+/170638 .

Dwarfdump is afflicted by some of these same LLVM bugs, so its diagnoses are not 100% trustworthy except as an indicator that Apple's tools are likely to reject an input. You might want to get your hands on a copy of gnu binutils (Macports or Homebrew) and see what [g]objdump says.

thanm commented

@tmm1 thanks for the analysis and for getting in touch with the Apple support folks.

You posted a patch that changes the definition of the abbrev entry for the inlined routine DIE, but I don't see any changes to other related parts of dwarf.go. Could you please post the entire patch? (or if this the entire patch, please confirm)?

Switching from FORM_udata to FORM_data4 is certainly doable, but it would increase the overall size of the DWARF (not good) and goes against the advice given in the DWARF standard (e.g. DWARF 4 spec section 7.5.4 " Producers are therefore strongly encouraged to use DW_FORM_sdata or DW_FORM_udata for signed and unsigned integers respectively, rather than DW_FORM_data).

thanm commented

Sorry, hit the wrong button there (didn't mean to close the issue).

tmm1 commented

Could you please post the entire patch? (or if this the entire patch, please confirm)?

I did not change anything else, but I am not very familiar with dwarf or this part of the code so I didn't know what else might need to change.

thanm commented

The code you modified controls how the .debug_abbrev section is emitted, but for that change to take effect you also have to change other parts the compiler that emit DWARF DIE data against that template. When I pull in your patch and rebuild (make/bash), I see a bunch of errors in the linker's dwarf tests (e.g. "cd src/cmd/link/internal/ld ; go test -test.v"):

--- FAIL: TestAbstractOriginSanityIssue25459 (0.26s)
    dwarf_test.go:737: error reading DWARF: unterminated child sequence
--- FAIL: TestVarDeclCoordsWithLineDirective (0.26s)
    dwarf_test.go:364: error reading DWARF: unterminated child sequence
--- FAIL: TestRuntimeTypeAttrInternal (0.26s)
    dwarf_test.go:939: error reading DWARF: unterminated child sequence
--- FAIL: TestVarDeclCoordsAndSubrogramDeclFile (0.27s)
    dwarf_test.go:364: error reading DWARF: unterminated child sequence
--- FAIL: TestAbstractOriginSanityIssue26237 (0.39s)
    dwarf_test.go:737: error reading DWARF: unterminated child sequence
...
tmm1 commented

@thanm Thanks this makes a lot more sense now. So I'm generating broken DWARF, but since it is no longer broken in the way symbols dislikes (i.e containing a pair of DW_AT_call_line,DW_FORM_udata), the tool is able to read it.

I understand the downsides of switching from udata, and it's unfortunate the Apple tool doesn't conform to the spec here. I've added that section to my bug report and hope this is something they can fix in the future.

In the mean time it appears our only option would be a patch to switch to data4. Would something like this be merged despite the downsides? Would it need to be behind an optional flag?

After some testing, I've found that this patch generates valid symbols that can be used in the app store for iOS:

diff --git a/src/cmd/internal/dwarf/dwarf.go b/src/cmd/internal/dwarf/dwarf.go
index df80039063..ad8344ebe3 100644
--- a/src/cmd/internal/dwarf/dwarf.go
+++ b/src/cmd/internal/dwarf/dwarf.go
@@ -434,7 +434,7 @@ var abbrevs = [DW_NABRV]dwAbbrev{
                        {DW_AT_low_pc, DW_FORM_addr},
                        {DW_AT_high_pc, DW_FORM_addr},
                        {DW_AT_call_file, DW_FORM_data4},
-                       {DW_AT_call_line, DW_FORM_udata},
+                       {DW_AT_call_line, DW_FORM_sdata},
                },
        },

@@ -446,7 +446,7 @@ var abbrevs = [DW_NABRV]dwAbbrev{
                        {DW_AT_abstract_origin, DW_FORM_ref_addr},
                        {DW_AT_ranges, DW_FORM_sec_offset},
                        {DW_AT_call_file, DW_FORM_data4},
-                       {DW_AT_call_line, DW_FORM_udata},
+                       {DW_AT_call_line, DW_FORM_sdata},
                },
        },

@@ -1212,7 +1212,7 @@ func PutInlinedFunc(ctxt Context, s *FnState, callersym Sym, callIdx int) error

        // Emit call file, line attrs.
        ctxt.AddFileRef(s.Info, ic.CallFile)
-       putattr(ctxt, s.Info, abbrev, DW_FORM_udata, DW_CLS_CONSTANT, int64(ic.CallLine), nil)
+       putattr(ctxt, s.Info, abbrev, DW_FORM_sdata, DW_CLS_CONSTANT, int64(ic.CallLine), nil)

        // Variables associated with this inlined routine instance.
        vars := ic.InlVars

I don't know what the implications of these changes are and whether they would be ok to upstream.

tmm1 commented

FYI this was Apple's response about using udata vs data1:

In some scenarios, using the fixed size data forms can be more compact. The workaround suggested here is partly because this is what clang uses, and clang is using data1 and data2 because it’s a more compact representation when variable length encoding isn’t necessary. Your choice of using udata may be valid for your exact situation, and is valid DWARF that symbols should handle.

tmm1 commented

@champo Awesome! Does the test I added in https://go-review.googlesource.com/c/go/+/170451 pass with your changes?

@tmm1 I'm afraid not :( It seems that this patch only works using -w -s so it doesn't entirely solve the problem.

Change https://golang.org/cl/174538 mentions this issue: cmd/compile: emit DWARF call_line attrs with data4 form on iOS

thanm commented

@tmm1 You mentioned previously that you filed a bug with Apple on this issue. Is there a bug number you could share, so that I can monitor it? If Apple fixes the bug I would like to find out, so that I can consider updating the Go compiler.

tmm1 commented

Sure, it's rdar://49599876

This seems to fix the bug, at least for me.

CGO_CFLAGS=-g go build -buildmode=c-archive -o test.a -ldflags=-compressdwarf=false piu.go
echo ar xv test.a
ar xv test.a
symbols go.o | more

begins

go.o [x86_64, 0.008977 seconds]:
    null-uuid                            /Users/drchase/work/tmp/go.o [OBJECT, FaultedFromDisk]  
        0x0000000000000000 (0x3048c7)  SEGMENT
            0x0000000000000000 ( 0x9cdb8) __TEXT __text
                0x0000000000000000 (    0x70) go.buildid [FUNC, NameNList, MangledNameNList, Merged, NList] 
                0x0000000000000070 (   0x130) sync/atomic.(*Value).Store [FUNC, LENGTH, NameNList, MangledNameNList, Merged, NList, Dwarf] 
                    0x0000000000000070 (    0x13) value.go:45
                    0x0000000000000083 (     0xe) value.go:45
                    0x0000000000000091 (     0x6) value.go:46

piu.go (same file as above):

package main

import "C"
import "log"

func main() {
    log.Printf("foo")
}

Discarding the proposed fix, I get(end-to-end):

~/work/tmp$ CGO_CFLAGS=-g go build -buildmode=c-archive -o test.a -ldflags=-compressdwarf=false piu.go
~/work/tmp$ echo ar xv test.a
ar xv test.a
~/work/tmp$ ar xv test.a
x - __.SYMDEF SORTED
x - go.o
x - 000000.o
x - 000001.o
x - 000002.o
x - 000003.o
x - 000004.o
x - 000005.o
x - 000006.o
x - 000007.o
x - 000008.o
x - 000009.o
x - 000010.o
~/work/tmp$ symbols go.o | more
go.o [x86_64, 0.001590 seconds]:
    null-uuid                            /Users/drchase/work/tmp/go.o [OBJECT, Empty]  
        0x0000000000000000 (0x303336)  SEGMENT

I probably ought to concoct a test for this.

Change https://golang.org/cl/170451 mentions this issue: cmd/link/internal/ld: ensure mach-o debug symbols are visible to App Store

tmm1 commented

I received an update from Apple today regarding another possible problem with our DWARF output. I think this is based on the original go.o I sent them so it may be fixed already.

While looking into this issue with the symbols tool, the symbols engineering team noticed a few other issues with your DWARF construction, and asked me to pass this information on to you.

DWARF has a notion of ownership where one entity can “own” another one, and this is a useful concept to have when building debugging tools. One of the ways the ownership is expressed is by expressing a range. For example, if entity A owns the range 0-100, then any of the other entities it owns should have a range that is between 0-100. Thus, if entity A claims ownership on entity B, but entity B has a range of 100-110, then this raises a few red flags regarding the validity of the generated DWARF. You can see the currently shipping tools flag this with the -verify option with one of the files you gave us:

$ dwarfdump -verbose -verify go.o.dwarf 
error: DIE address ranges are not contained in its parent's ranges:
thanm commented

Thanks for the update. I will take a look.

thanm commented

For the specific case of the relocatable object I'm not sure if the errors being reported make sense, since the ranges in question are covered by relocations and the verifier itself doesn't seem to be reading or applying relocations.

When I do a regular link of the program in question I do still see some issues, however, meaning that the verifier does seem to be catching potential problems. I filed a new issue to track:

#33188

thanm commented

Can this bug be closed now, given the compiler change that was submitted? There is a tracking bug for the DWARF verifier issues that the Apple folks reported.