4ad/go

dwarf64 parsing problems

Opened this issue · 3 comments

Executive Summary:

Problem exists in all versions of golang that I checked up to 1.8.3, but
it's related to Solaris support, so I'm filing the issue here.
I found the problem on an x86/Solaris11 system, but the issue should
affect any system that has or generates dwarf64 objects.

The problem shows up as a failure in cmd/link TestDWARF. It only appears
when the "debug" version of operating system files (crt objects) are installed.
In this case, those debug files are created by the Studio compilers, and contain
dwarf64 debug data. On systems without the "debug" versions of these files,
the problem is not seen.

I've identified two issues so far, and created fixes for them. An additional issue
with reading this dwarf information is still not resolved.

This will be hard for anyone else to reproduce unless they have a solaris
image with dwarf64 debug information in /usr/lib/amd64/crt1.o

Details:

On some Solaris systems TestDWARF in cmd/link fails.

% go test cmd/link
--- FAIL: TestDWARF (8.04s)
   --- FAIL: TestDWARF/testprogcgo (5.37s)
       dwarf_test.go:81: decoding dwarf section info at offset 
                           0x4: cannot determine byte order

This happens when cgo-generated binaries include dwarf information
from Studio compilers. In the case where this was discovered,
the Solaris system had debug information in some system crt .o files
which got included in ELF output file in the test.

I used the following change to the test to capture the binary:

--- a/src/cmd/link/dwarf_test.go
+++ b/src/cmd/link/dwarf_test.go
@@ -50,6 +50,7 @@ func TestDWARF(t *testing.T) {
                        exe := filepath.Join(tmpDir, prog+".exe")
                        dir := "../../runtime/testdata/" + prog
                        out, err := exec.Command(testenv.GoToolPath(t), "build", "-o", exe, dir).CombinedOutput()
+                       exec.Command("/bin/cp", exe, "/tmp/cqexe").Run()
                        if err != nil {
                                t.Fatalf("go build -o %v %v: %v\n%s", exe, dir, err, out)
                        }

Inspecting the dwarfdump output from the binary shows the first two compile units
in .debug_info are for crt1.o and values-Xa.o which don't normally have debug info.

The Studio dwarf output is interesting in two ways. 1) It uses the dwarf64 format,
and the golang modules for reading dwarf have a few bugs for dwarf64. 2) It
has extra padding at the end of a compile unit. The padding is not easy to separate
from the start of the next compile unit block.

http://www.dwarfstd.org/doc/DWARF4.pdf

So I fixed two things that appear to be bugs in the treatment of dwarf64 vs dwarf32,
but then the test still fails because readinf the second compile unit gets
messed up because of the padding at the end of the first compile unit.

Here is one fix for the code that peeks ahead into the dwarf to find its byte order.

--- a/src/debug/dwarf/open.go
+++ b/src/debug/dwarf/open.go
@@ -58,7 +58,14 @@ func New(abbrev, aranges, frame, info, line, pubnames, ranges, str []byte) (*Dat
        if len(d.info) < 6 {
                return nil, DecodeError{"info", Offset(len(d.info)), "too short"}
        }
+        // 32-bit dwarf has 4 byte size then 2 byte version
        x, y := d.info[4], d.info[5]
+        if (d.info[0] == 255 && d.info[1] == 255) {
+           // 64-bit DWARF has 4 bytes of FF, then 8 byte size,
+          // then 2 byte version
+           x = d.info[12]
+           y = d.info[13]
+        }
        switch {
        case x == 0 && y == 0:
                return nil, DecodeError{"info", 4, "unsupported version 0"}

I started to make a fix for another dwarf32/64 difference, but I'm not sure it's
a good fix. There is a section offset in the compile_unit header which is a
different size in dwarf64, so I handled that. But I'm not sure if all
the buffer logic can handle 64-bit offsets or not. So after reading the 64-bit
section offset I truncate it to 32-bits and continue on as before.

--- a/src/debug/dwarf/unit.go
+++ b/src/debug/dwarf/unit.go
@ -67,7 +67,14 @@ func (d *Data) parseUnits() ([]unit, error) {
                        break
                }
                u.vers = int(vers)
-               atable, err := d.parseAbbrev(b.uint32(), u.vers)
+               var aoff uint32
+               if !u.is64 {
+                       aoff = b.uint32()
+               } else {
+                       // For now ignore the high bits
+                       aoff = uint32(b.uint64())
+               }
+               atable, err := d.parseAbbrev(aoff, u.vers)
                if err != nil {
                        if b.err == nil {
                                b.err = err

After making those two fixes, I ran into the problem with the trailing padding.
Unfortunately to see the padding you have to visually parse the raw hex dump
of the compile unit which is painful for people with dwarf experience and
probably impossible for others.

In this case, the crt file is the first compile_unit in the executable and it has
it only has one die. Here is what I did:

% elfdump -N .debug_info -w ~/di /usr/lib/amd64/crt1.o
% dwarfdump /usr/lib/amd64/crt1.o > ~/dw
% od -A x -c ~/di | tail
0000250   /   s   e   r   v   e   r   -   p   r   o   t   o   /   r   o
0000260   o   t   _   i   3   8   6   _   s   t   u   b   /   u   s   r
0000270   /   i   n   c   l   u   d   e       -   c           .   .   /
0000280   c   o   m   m   o   n   /   c   r   t   1   .   c  \0   X   a
0000290   ;   O   ;   P   ;   R   =   5   .   1   3   <   <   S   u   n
00002a0       C       5   .   1   3       S   u   n   O   S   _   i   3
00002b0   8   6       P   a   t   c   h       1   5   1   6   3   3   -
00002c0   0   4       2   0   1   5   /   1   1   /   0   6   >   >   ;
00002d0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
00002da

If you look at the file ~/dw you'll be able to see the last two attributes
are a string ending in ">>;" and then a dwarf attribute which is an 8-byte
offset. The hex dump shows two additional zero bytes at the end of the section
data that shouldn't really be there.

Background: I believe the padding as a Studio-specific glitch because someone thought
they should be able to read the compile unit header without constructing
it byte-by-byte. On sparc, this means it must be aligned, but the format
of debug_info doesn't guarantee the header will be aligned when multiple units
are concatenated. So I think that's where the padding came from, and it's likely
that other dwarf generators don't create it, and it's likely that other dwarf readers
don't account for it.

Getting this padding changed in the Studio compilers will take time
and energy, and we'll still be dealing with older compilers, so I'm going to assume
we want to add a heuristic to deal with the padding in the golang code.
I haven't written that yet, but it shouldn't be too bad.

Notes on fixing the abbrev table offset:

It's possible that some compilers don't faithfully implement the
dwarf spec in all various aspects. So it wouldn't hurt to
double check which compilers produce dwarf64 format (under -m32 and/or -m64)
and when they do, whether they use a 64-bit abbrev table offset.

4ad commented

Thanks for the detailed analysis.

The good news is that the yucky padding issue was just me misreading the hex dump. The problem was somewhere else. The other good news is that I found the bug, and the test passes with our dwarf64 files. The embarrassing news is that I spent a long time adding print statements and looking at more hex dumps when I should have just desk-checked a simple loop. sigh. But I've been there before and I'll be there again. :-) I'll get a complete set of diffs up here soon.

Here's the pull request for this issue:
#27