bug(dbus): New[User|System]ConnectionContext doesn't respect incoming context.Context
bisakhmondal opened this issue · 4 comments
Hi Community,
I a recent event, I just discovered that the two functions to establish dbus connection doesn't handle context deadline event
func TestGoSystemd(t *testing.T) {
ctx, _ := context.WithTimeout(context.Background(), 0)
_, err := dbus.NewSystemConnectionContext(ctx)
if err != nil {
t.Fatal(err)
}
}
this throws:
=== RUN TestGoSystemd
dbus_test.go:13: read unix @->/run/dbus/system_bus_socket: use of closed network connection
--- FAIL: TestGoSystemd (0.00s)
Same for
ctx, _ := context.WithTimeout(context.Background(), 0)
_, err := dbus.NewUserConnectionContext(ctx)
// : read unix @->/run/user/1000/bus: use of closed network connection
And when integrated with large software, it makes debugging a lot harder. I think here it's best to use standard context.DeadlineExceeded
error when such a scenario occurs. Thank you!
Thanks for the report.
It would be good to pinpoint where that error is being generated.
I think this may be coming from the underlying go-dbus, possibly by some methods not properly context-aware.
My raw guess is that is failing at some point during the Hello
or Auth
steps for dbus.
Hi @lucab, thanks for the reply. I just took a quick look with a debugger
The NewSystemConnectionContext calls
// 1
func NewSystemConnectionContext(ctx context.Context) (*Conn, error) {
return NewConnection(func() (*dbus.Conn, error) {
return dbusAuthHelloConnection(ctx, dbus.SystemBusPrivate)
})
}
// 2
func NewConnection(dialBus func() (*dbus.Conn, error)) (*Conn, error) {
sysconn, err := dialBus()
// so the go anonymous function gets called
func dbusAuthHelloConnection(ctx context.Context, createBus func(opts ...dbus.ConnOption) (*dbus.Conn, error)) (*dbus.Conn, error) {
conn, err := dbusAuthConnection(ctx, createBus)
// which in terms calls the go-dbus SystemBusPrivate method with conniption dbus with Context
conn, err := createBus(dbus.WithContext(ctx))
if err != nil {
return nil, err
}
Development Error 1 - go-dbus
Without checking context expiry the package just returned a new connection (image above) without any error.
Development Error 2 - go-dbus
The newConn
function from go-dbus
runs a watcher in a goroutine. While monitoring the context, upon ctx.Done()
event they just abruptly closed the underlying connection without throwing/caching the actual error.
Now all subsequent requests to the conn fail (as the connection is already closed). But the reason for closing the connection was never revealed.
ref:
https://github.com/godbus/dbus/blob/b357b44b7ab3bf9e9b27c906fb31cb622b7a017e/conn.go#L302-L306
Now I think we have two options:
- Not use the
dbus.WithContext(ctx)
as a connection option and implement our own watcher. It's not very complicated. I can help you with that.
one example: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/common/testexec/testexec.go (a context-aware exec)
As go-dbus is very good at providing the primitives (I think it should keep doing it that way instead of worrying about everything) but developers mostly care for the ready to use methods that go-systemd/dbus provides unless they are writing a dbus package. (we can tell them to remove dbus.WithContext(ctx)
as it doesn't work as expected).
- Make the changes in the go-dbus codebase.
Do let me know what you think. Thanks!
ping @lucab
It sounds like you have pinpointed some faulty logic in go-dbus
, so I'd suggest taking this discussion there and try to enhance that code. I'm not really keen on trying to add dbus workarounds here as they always increase maintenance complexity, no matter how simple they seem.
That said, I also fear that you may be chasing a ghost.
Context expiration happens asynchronously, so there is always going to be an implicit race between the context watcher (which closes the connection) and other I/O methods (which use the connection).
Even after improving the logic which creates a connection, I think you will always face this race later on (e.g. when using a very small but non-zero timeout). Depending on the non-deterministic result of this race, you may keep seeing I/O errors being returned when the connection gets closed.