coreos/go-systemd

bug(dbus): New[User|System]ConnectionContext doesn't respect incoming context.Context

bisakhmondal opened this issue · 4 comments

Hi Community,
I a recent event, I just discovered that the two functions to establish dbus connection doesn't handle context deadline event

func TestGoSystemd(t *testing.T) {
	ctx, _ := context.WithTimeout(context.Background(), 0)
	_, err := dbus.NewSystemConnectionContext(ctx)
	if err != nil {
		t.Fatal(err)
	}
}

this throws:

=== RUN   TestGoSystemd
    dbus_test.go:13: read unix @->/run/dbus/system_bus_socket: use of closed network connection
--- FAIL: TestGoSystemd (0.00s)

Same for

ctx, _ := context.WithTimeout(context.Background(), 0)
_, err := dbus.NewUserConnectionContext(ctx)
// : read unix @->/run/user/1000/bus: use of closed network connection

And when integrated with large software, it makes debugging a lot harder. I think here it's best to use standard context.DeadlineExceeded error when such a scenario occurs. Thank you!

lucab commented

Thanks for the report.
It would be good to pinpoint where that error is being generated.
I think this may be coming from the underlying go-dbus, possibly by some methods not properly context-aware.
My raw guess is that is failing at some point during the Hello or Auth steps for dbus.

Hi @lucab, thanks for the reply. I just took a quick look with a debugger
The NewSystemConnectionContext calls

// 1
func NewSystemConnectionContext(ctx context.Context) (*Conn, error) {
	return NewConnection(func() (*dbus.Conn, error) {
		return dbusAuthHelloConnection(ctx, dbus.SystemBusPrivate)
	})
}
// 2
func NewConnection(dialBus func() (*dbus.Conn, error)) (*Conn, error) {
	sysconn, err := dialBus()

// so the go anonymous function gets called
func dbusAuthHelloConnection(ctx context.Context, createBus func(opts ...dbus.ConnOption) (*dbus.Conn, error)) (*dbus.Conn, error) {
	conn, err := dbusAuthConnection(ctx, createBus)

// which in terms calls the go-dbus SystemBusPrivate method with conniption dbus with Context
conn, err := createBus(dbus.WithContext(ctx))
	if err != nil {
		return nil, err
	}

image

Development Error 1 - go-dbus

Without checking context expiry the package just returned a new connection (image above) without any error.

Development Error 2 - go-dbus

The newConn function from go-dbus runs a watcher in a goroutine. While monitoring the context, upon ctx.Done() event they just abruptly closed the underlying connection without throwing/caching the actual error.
Now all subsequent requests to the conn fail (as the connection is already closed). But the reason for closing the connection was never revealed.

ref:

https://github.com/godbus/dbus/blob/b357b44b7ab3bf9e9b27c906fb31cb622b7a017e/conn.go#L302-L306

Now I think we have two options:

  1. Not use the dbus.WithContext(ctx) as a connection option and implement our own watcher. It's not very complicated. I can help you with that.
    one example: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/common/testexec/testexec.go (a context-aware exec)

As go-dbus is very good at providing the primitives (I think it should keep doing it that way instead of worrying about everything) but developers mostly care for the ready to use methods that go-systemd/dbus provides unless they are writing a dbus package. (we can tell them to remove dbus.WithContext(ctx) as it doesn't work as expected).

  1. Make the changes in the go-dbus codebase.

Do let me know what you think. Thanks!

lucab commented

It sounds like you have pinpointed some faulty logic in go-dbus, so I'd suggest taking this discussion there and try to enhance that code. I'm not really keen on trying to add dbus workarounds here as they always increase maintenance complexity, no matter how simple they seem.

That said, I also fear that you may be chasing a ghost.
Context expiration happens asynchronously, so there is always going to be an implicit race between the context watcher (which closes the connection) and other I/O methods (which use the connection).
Even after improving the logic which creates a connection, I think you will always face this race later on (e.g. when using a very small but non-zero timeout). Depending on the non-deterministic result of this race, you may keep seeing I/O errors being returned when the connection gets closed.