google/capslock

Path elements in JSON output are not consistent between runs

efd6 opened this issue · 3 comments

efd6 commented

I'm using capslock (probably in a way that is not intended) to identify module wide imports that are org-foreign. This is a cheap hack implemented here with obvious issues of false positives.

One of the extensions that I was interested in making was to use the path output to identify specific syscalls that are made when a CAPABILITY_SYSTEM_CALLS is fount. This requires examining the call path array in the capslock JSON output so that the AST of the package can be examined for the actual args of the sycall call. While looking into this, I found that the order of the input packages to capslock impacts on the output of the JSON and the final sites identified in the path are not stable between runs (the summary output and caps counts are stable).

To demonstrate this here are example runs of capslock with a post processor available here https://play.golang.com/p/JoC-mB_NZYA (built as an executable syscalls). The tests were run in packetbeat/protos directory of http://github.com/elastic/beats.

Different packages order (note missing unix.CmsgSpace callsite is consistent between runs):

$ capslock -goos darwin -goarch amd64 -output json -packages github.com/insomniacslk/dhcp/dhcpv4,github.com/miekg/dns 2>/dev/null | syscalls
syscall.Bind: {{Name:github.com/insomniacslk/dhcp/dhcpv4.MakeListeningSocket Site:{Filename:client.go Line:113 Column:23}}}
syscall.Sendto: {{Name:github.com/insomniacslk/dhcp/dhcpv4.BroadcastSendReceive Site:{Filename:client.go Line:213 Column:25}}}
syscall.SetsockoptInt: {{Name:github.com/insomniacslk/dhcp/dhcpv4.BindToInterface Site:{Filename:bindtodevice_darwin.go Line:15 Column:30}}, {Name:github.com/insomniacslk/dhcp/dhcpv4.MakeBroadcastSocket Site:{Filename:client.go Line:81 Column:29}}}
$ capslock -goos darwin -goarch amd64 -output json -packages github.com/miekg/dns,github.com/insomniacslk/dhcp/dhcpv4 2>/dev/null | syscalls
golang.org/x/sys/unix.CmsgLen: {{Name:golang.org/x/net/internal/socket.controlHeaderLen Site:{Filename:cmsghdr_unix.go Line:13 Column:21}}}
golang.org/x/sys/unix.CmsgSpace: {{Name:golang.org/x/net/internal/socket.controlMessageSpace Site:{Filename:cmsghdr_unix.go Line:21 Column:23}}}
golang.org/x/sys/unix.SetsockoptInt: {{Name:github.com/miekg/dns.reuseportControl$1 Site:{Filename:listen_reuseport.go Line:19 Column:29}}}
syscall.Bind: {{Name:github.com/insomniacslk/dhcp/dhcpv4.MakeListeningSocket Site:{Filename:ipsock.go Line:180 Column:2}}}
syscall.Sendto: {{Name:github.com/insomniacslk/dhcp/dhcpv4.BroadcastSendReceive Site:{Filename:ipsock.go Line:278 Column:35}}}
syscall.SetsockoptInt: {{Name:github.com/insomniacslk/dhcp/dhcpv4.BindToInterface Site:{Filename:ipsock.go Line:296 Column:42}}, {Name:github.com/insomniacslk/dhcp/dhcpv4.MakeBroadcastSocket Site:{Filename:ipsock.go Line:151 Column:19}}}

Single package, multiple runs:

$ capslock -goos darwin -goarch amd64 -output json -packages github.com/miekg/dns 2>/dev/null | syscalls
golang.org/x/sys/unix.CmsgLen: {{Name:golang.org/x/net/internal/socket.controlHeaderLen Site:{Filename:cmsghdr_unix.go Line:13 Column:21}}}
golang.org/x/sys/unix.CmsgSpace: {{Name:golang.org/x/net/internal/socket.controlMessageSpace Site:{Filename:cmsghdr_unix.go Line:21 Column:23}}}
golang.org/x/sys/unix.SetsockoptInt: {{Name:github.com/miekg/dns.reuseportControl$1 Site:{Filename:listen_reuseport.go Line:19 Column:29}}}
$ capslock -goos darwin -goarch amd64 -output json -packages github.com/miekg/dns 2>/dev/null | syscalls
golang.org/x/sys/unix.CmsgLen: {{Name:golang.org/x/net/internal/socket.controlHeaderLen Site:{Filename:cmsghdr_unix.go Line:13 Column:21}}}
golang.org/x/sys/unix.CmsgSpace: {{Name:golang.org/x/net/internal/socket.controlMessageSpace Site:{Filename:cmsghdr_unix.go Line:21 Column:23}}}
golang.org/x/sys/unix.SetsockoptInt: {{Name:github.com/miekg/dns.reuseportControl$1 Site:{Filename:listen_reuseport.go Line:19 Column:29}}}

I've noticed the same behavior when passing multiple packages to capslock. I've also noticed that not just the order of capabilities differ between capslock runs, but the contents of the capabilities
themselves differ. Running capslock twice with the same arguments against the exact same source code can result in totally different output, even when sorted.

As an example, when running in the root directory of https://github.com/gravitational/teleport with capslock -packages google.golang.org/api/admin/directory/v1,google.golang.org/api/cloudidentity/v1,google.golang.org/api/iterator,google.golang.org/api/sqladmin/v1beta4 -output j and sorting the found capabilities the hash and even the size of two processed run outputs is different.

Code I used to sort output: https://go.dev/play/p/kF3ARegvxKj (it expects to args: path to capslock output json file and a path to write the sorted output to).

After running capslock with the args above and processing the output of each run, comparing the processed output confirms they are different:

$ sha256sum sorted*
4a81e59c288df75face6790e683674b34dcb554d724fa15a80281fa8f1b33eaa  sorted1.json
4cddf2ab60836109285e15107260ff813d48846faf3dc77f0c8af9aed3da34e6  sorted2.json
$ ls -la sorted*
-rw-r--r-- 1 capnspacehook capnspacehook 2752438 Oct  9 18:43 sorted1.json
-rw-r--r-- 1 capnspacehook capnspacehook 2758625 Oct  9 18:43 sorted2.json
jcd2 commented

Thanks for the detailed reports! It looks like we have two underlying problems -- the lack of determinism in the choice of example call paths causing spurious diffs in the output, and incorrect mapping of call sites to source code locations when there is more than one input package specified. We're looking into this.

efd6 commented

An issue related to this is that when there are replace directives in the go.mod, the replaced package name is used, rather than the replacing package. This can be worked around after the fact, but can be confusing.