grpc/grpc-swift

Connection management seems to just stop

ericsink opened this issue · 8 comments

This might be a bug, but I'm not sure, so I'm logging it as a question.

The context is an iOS app. I've seen the issue with grpc-swift 1.19.0 and 1.19.1. The current tests are happening with iOS 17.0.1, but I'm pretty sure I was seeing it with iOS 16 as well.

The problem is that sometimes grpc seems to just stop working, and the logging goes silent. The problem is not consistently reproducible, but in regular usage and testing of the app, it seems to happen every day or two.

I have both loggers in place at trace log level.

Here's an example from a couple days ago, starting with a successful session for context, and then showing how things seem to go wrong:

At 14:52 I woke up the iPad and brought the app to the foreground:

 2023-10-02T14:52:59-0500 info app : applicationWillEnterForeground
 2023-10-02T14:52:59-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/70 connectivity_state=ready deactivating connection
 2023-10-02T14:52:59-0500 debug grpc_bg : delay=0 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/70 scheduling connection attempt
 2023-10-02T14:52:59-0500 info ConnectionStatus : ConnectivityDelegate: connectionClosed
 2023-10-02T14:52:59-0500 trace grpc_call : grpc.conn.addr_local=192.168.189.221 grpc_request_id=83CCA6BC-AED3-4997-B464-239F9272038B call_state=closed grpc.conn.addr_remote=18.116.194.70 failing buffered writes
 2023-10-02T14:52:59-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-02T14:52:59-0500 info tsk_active_games : top of loop

The label grpc_bg is the background activity logger and grpc_call is the one for call options. I have a connectivity delegate that logs as well, using ConnectionStatus.

The entries for tsk_active_games correspond to a stream where the app is listening for items from the server. When the app wakes up, it catches the unavailable (14) error, considers it to be a transient condition, and goes back to the top of its loop to re-establish the stream. There are several more of those kinds of streams and loops in play. I've included the one above as an example and omitted the rest for brevity.

The next interesting part of the log is:

 2023-10-02T14:52:59-0500 info ConnectionStatus : ConnectivityDelegate: startedConnecting
 2023-10-02T14:52:59-0500 debug grpc_call : pool.waiters.count=0 grpc_request_id=5FA1CA42-DD43-46AD-98EB-428F1EB1048C pool.waiter.id=ObjectIdentifier(0x0000000281dfbac0) waiting for a connection to become available
 2023-10-02T14:52:59-0500 trace grpc_bg : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x000000028286a780) enqueued connection waiter

This too looks fine, as grpc_bg told us that it would be scheduling a connection attempt, and so it is.

After a bunch more of those waiters in the log, we see this:

 2023-10-02T14:52:59-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 making client bootstrap with event loop group of type NIOTSEventLoop
 2023-10-02T14:52:59-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
 2023-10-02T14:52:59-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 connectivity_state=connecting activating connection
 2023-10-02T14:52:59-0500 debug grpc_bg : tls_version=tlsv13 grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 grpc.conn.addr_local=192.168.189.221 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc.conn.addr_remote=18.116.194.70 alpn=h2 TLS handshake completed
 2023-10-02T14:52:59-0500 debug grpc_bg : grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 h2_settings_max_concurrent_streams=128 grpc.conn.addr_remote=18.116.194.70 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc.conn.addr_local=192.168.189.221 h2_settings_initial_window_size=65536 h2_settings_max_frame_size=16777215 HTTP2 settings update
 2023-10-02T14:52:59-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 connectivity_state=active connection ready
 2023-10-02T14:52:59-0500 info ConnectionStatus : ConnectivityDelegate: connectSucceeded
 2023-10-02T14:52:59-0500 trace grpc_bg : pool.waiters.count=10 poolmanager.id=ObjectIdentifier(0x000000028286a780) servicing waiters

This seems fine too. The connection was established, so now it needs to go deal with everyone who is waiting on it.

After this the log contains around 600 lines of stuff that all appears to be successful.

Along the way, we see:

 2023-10-02T14:53:01-0500 info app : willResignActiveNotification

and this:

 2023-10-02T14:53:02-0500 info app : didEnterBackgroundNotification

and then the final entries for this session:

 2023-10-02T14:53:02-0500 debug grpc_bg : h2_active_streams=11 grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 grpc.conn.addr_local=192.168.189.221 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc.conn.addr_remote=18.116.194.70 h2_stream_id=HTTP2StreamID(77) HTTP2 stream created
 2023-10-02T14:53:02-0500 trace grpc_call : grpc_request_id=00A107D5-8DD6-4B3C-84C2-221385333AE4 grpc.conn.addr_local=192.168.189.221 grpc.conn.addr_remote=18.116.194.70 request buffer drained
 2023-10-02T14:53:03-0500 debug grpc_bg : h2_active_streams=10 grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 grpc.conn.addr_local=192.168.189.221 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc.conn.addr_remote=18.116.194.70 h2_stream_id=HTTP2StreamID(77) HTTP2 stream closed
 2023-10-02T14:53:03-0500 trace grpc_call : grpc_request_id=00A107D5-8DD6-4B3C-84C2-221385333AE4 h2_headers=[([], ":status", "200"), ([], "date", "Mon, 02 Oct 2023 19:53:03 GMT"), ([], "content-type", "application/grpc"), ([non-indexable], "server", "Kestrel")] h2_end_stream=false h2_payload=HEADERS received HTTP2 frame
 2023-10-02T14:53:03-0500 trace grpc_call : h2_payload=DATA h2_end_stream=false grpc_request_id=00A107D5-8DD6-4B3C-84C2-221385333AE4 h2_data_bytes=83 received HTTP2 frame
 2023-10-02T14:53:03-0500 info tsk_completed_games : await it.next()
 2023-10-02T14:53:03-0500 trace grpc_call : h2_payload=HEADERS h2_end_stream=true grpc_request_id=00A107D5-8DD6-4B3C-84C2-221385333AE4 h2_headers=[([non-indexable], "grpc-status", "0")] received HTTP2 frame

So AFAICT, everything here seems fine.

Around an hour a half later, I pickup the iPad and open the app again:

 2023-10-02T16:19:31-0500 info app : applicationWillEnterForeground
 2023-10-02T16:19:31-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 connectivity_state=ready deactivating connection
 2023-10-02T16:19:31-0500 debug grpc_bg : delay=0 poolmanager.id=ObjectIdentifier(0x000000028286a780) grpc_connection_id=7E274071-B86A-4AF4-BAE0-AE35245F1B98/71 scheduling connection attempt
 2023-10-02T16:19:31-0500 info ConnectionStatus : ConnectivityDelegate: connectionClosed
 2023-10-02T16:19:31-0500 trace grpc_call : grpc.conn.addr_local=192.168.189.221 grpc.conn.addr_remote=18.116.194.70 call_state=closed grpc_request_id=FED0F8D2-0A5B-4065-A099-627F0C4877AC failing buffered writes
 2023-10-02T16:19:31-0500 error PlayerModel 1024 : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-02T16:19:31-0500 info PlayerModel 1024 : top of loop
 2023-10-02T16:19:31-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-02T16:19:31-0500 info tsk_active_games : top of loop
 2023-10-02T16:19:31-0500 trace grpc_call : grpc.conn.addr_local=192.168.189.221 grpc.conn.addr_remote=18.116.194.70 call_state=closed grpc_request_id=3987031E-963B-4974-B89F-D6EDA2B0B61E failing buffered writes
 2023-10-02T16:19:32-0500 info app : applicationDidBecomeActive

Just like the previous time the app came to the foreground, it notices the connection became inactive and schedules it to be re-opened, but that never actually happens. For some reason, only a couple of the stream loops caught an exception. The others seem to be still sitting at an await. All such loops in the app are structured to catch and log any exceptions.

After the entries shown just above, there are no further log messages I see from the grpc layer. The app is still running, and the log contains a bunch of other stuff from my own loggers. There are even some attempts to make other grpc calls, but those calls appear to be stuck forever at await, and the grpc loggers have gone completely silent.

The backend server (which is in C# BTW) is still running, and the network connectivity is working. Another iOS device running the same app on the same network as the iPad was able to access the server without problems.

The code to setup things up is:

        let keepalive = ClientConnectionKeepalive(
            interval: .seconds(30),
            timeout: .seconds(20)
            )
        let group = PlatformSupport.makeEventLoopGroup(loopCount: 1)
        let channel = try GRPCChannelPool.with(
            target: .host(BuildConfiguration.shared.host, port: BuildConfiguration.shared.port),
            transportSecurity: BuildConfiguration.shared.transportSecurity(group: group),
            eventLoopGroup: group
        )
        {
            cfg in
            cfg.delegate = conn_delegate
            cfg.backgroundActivityLogger = LogInfo.shared.logger("grpc_bg", logLevel: .trace)
            cfg.keepalive = keepalive
            //cfg.connectionBackoff
            //cfg.idleTimeout = TimeAmount.seconds(1)
        }

        self.client = Greet_GreeterAsyncClient(channel: channel)

A similar failure happened yesterday. At about 21:49, the logs show a "successful" session like the one described above. The app comes to the foreground, realizes that its connection is inactive, recovers, does a bunch of stuff, and everything looks fine.

Just over 20 minutes later, I bring the app up again and I get this:

 2023-10-03T22:10:33-0500 info app : applicationWillEnterForeground
 2023-10-03T22:10:33-0500 error tsk_credits : CATCH GRPCStatus : unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 trace grpc_call : call_state=closed grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 grpc_request_id=DEFAE855-8415-4990-A253-DD398C35F9F6 failing buffered writes
 2023-10-03T22:10:33-0500 error tsk_completed_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 info tsk_completed_games : top of loop
 2023-10-03T22:10:33-0500 trace grpc_call : call_state=closed grpc.conn.addr_remote=3.132.145.240 grpc_request_id=C37D54EA-063F-4383-B8C5-177502594FF9 grpc.conn.addr_local=192.168.1.41 failing buffered writes
 2023-10-03T22:10:33-0500 trace grpc_call : grpc_request_id=967B1816-4961-412E-8355-29F2BD27D947 grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 call_state=closed failing buffered writes
 2023-10-03T22:10:33-0500 error PlayerModel 1024 : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 info PlayerModel 1024 : top of loop
 2023-10-03T22:10:33-0500 info tsk_completed_games : await it.next()
 2023-10-03T22:10:33-0500 error tsk_invites : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 trace grpc_call : grpc_request_id=D733A20A-C739-4C48-B78C-97DDDD5E464C grpc.conn.addr_remote=3.132.145.240 call_state=closed grpc.conn.addr_local=192.168.1.41 failing buffered writes
 2023-10-03T22:10:33-0500 info tsk_invites : top of loop
 2023-10-03T22:10:33-0500 error PlayerModel 1025 : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 info PlayerModel 1025 : top of loop
 2023-10-03T22:10:33-0500 trace grpc_call : grpc_request_id=686DFD0E-ABD1-4865-9A1C-259BB73C7FFB call_state=closed grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
 2023-10-03T22:10:33-0500 error PlayerModel 1028 : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 info PlayerModel 1028 : top of loop
 2023-10-03T22:10:33-0500 trace grpc_call : grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 grpc_request_id=3ACFB962-A556-426E-BD9A-9EA0CBE1E7B4 call_state=closed failing buffered writes
 2023-10-03T22:10:33-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
 2023-10-03T22:10:33-0500 info tsk_active_games : top of loop
 2023-10-03T22:10:33-0500 trace grpc_call : grpc_request_id=3B10C929-4B1F-443B-8042-8CFF6832CC4C call_state=closed grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
 2023-10-03T22:10:33-0500 info app : applicationDidBecomeActive
 2023-10-03T22:10:50-0500 info MyPlayerList : calling server to list players, kind = suggested
 2023-10-03T22:10:50-0500 info MyPlayerList : calling server to list players, kind = previous
 2023-10-03T22:11:48-0500 info MyPlayerList : calling server to list players, kind = suggested
 2023-10-03T22:11:48-0500 info MyPlayerList : calling server to list players, kind = previous
 2023-10-03T22:12:12-0500 info MyPlayerList : calling server to list players, kind = suggested
 2023-10-03T22:12:12-0500 info MyPlayerList : calling server to list players, kind = previous
 2023-10-03T22:13:57-0500 info app : willResignActiveNotification
 2023-10-03T22:13:57-0500 info app : didEnterBackgroundNotification

This one is slightly different. All the loops caught an error and logged it. But grpc_bg and the connectivity delegate aren't saying anything all. I kept the app running and tried a few more times to bring it into the foreground. Like the previous case, the app seems mostly fine, but nothing grpc-related is working, and the grpc loggers are silent on the matter.

I claimed at the top that this was a question, so I should ask one.

Any idea what would cause the behavior I'm seeing, where grpc-swift seems to have stopped managing the connection with no log messages or errors to explain why?

Thanks for filing an issue with so much information Eric! I've been pondering this one for a little while and it's not immediately obvious what's going wrong here.

I'd like to look at two separate issues:

  1. Some streams aren't terminated when the connection drops leading to indefinite awaits in your loops.
  2. The background connection management appears to stop running.

The first issue is odd: your configuration uses an EventLoopGroup with one loop and you don't adjust the number of connections per loop. The default is one, so you'll have at most one connection at a time (not strictly true, you could have more if the server sent a GOAWAY frame but I'll assume that isn't the case for now). The bit that's odd is that some streams are closed but not all. Presumably the streams are closed because the connection was closed in which case all streams should be closed.

Does your connectivity delegate show any more information about the connection? Ideally it would be useful to log on each of its methods.

The second issue is also a bit curious: the underlying eventing system appears to be working because streams are failing. Does the connectivity delegate show anything around that time? I'm wondering if the connection is in a backoff period waiting to reconnect.

Another thing I'd be curious to see is changing to a different networking stack. SwiftNIO can run using posix sockets or on Apple's Network.framework. At the moment you're using PlatformSupport.makeEventLoopGroup and running on iOS so you'll be getting Network.framework, could you try calling makeEventLoopGroup(loopCount: 1, networkPreference: .userDefined(.posix))? This will give you the posix stack. If these issues don't reproduce then it might be related to how we're holding Network.framework.

It looks like the connectivity delegate is logging on every method except connectionUtilizationChanged. I think I turned that one off at some point because it was noisy, and then forgot to turn it back on.

Basically, except for that one method, I'm pretty sure that I included all connectivity delegate log messages that actually happened. There are places where I expect to see more log messages from connectivity delegate, but they're not there.

On the second issue you mentioned, I've convinced myself (perhaps erroneously, but still) that this is not just a matter of the connection being in the backoff period. Mostly my evidence here is that I understand the max backoff period to be about 2 minutes, and I've intentionally kept the app active for longer than that. I didn't show the logs beyond a certain point, but one of those tests I kept the app in the foreground for 5 minutes or so, and then continued to use the app periodically for hours. So there are lots of log messages from other parts of the app during that time, but nothing at all from the grpc layer.

I'll try a build with the posix layer.

I've also been tweaking our build process to let us use a fork of grpc-swift so I can add more logging. For example, I've added a trace log message to show each ping sent.

I'm looking at trying to add a log message in ConnectionManager.swift somewhere around here:

        // Yes, start connecting now. We should go via `transientFailure`, however.
        let scheduled = self.eventLoop.scheduleTask(in: .nanoseconds(0)) {
          self.startConnecting()
        }
        self.logger.debug("scheduling connection attempt", metadata: ["delay": "0"])
        let backoffIterator = self.connectionBackoff?.makeIterator()
        self.state = .transientFailure(TransientFailureState(
          from: ready,
          scheduled: scheduled,
          backoffIterator: backoffIterator
        ))

In the original post above, the first case which is failing, I think this is the related code, and I can't see any evidence that startConnecting ever actually happened.

If you have any suggestions of where I might add more log messages that might be diagnostically helpful, let me know.

Thanks for the reply!

We have reproduced the problem with the posix layer.

2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 making client bootstrap with event loop group of type SelectableEventLoop
2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap
2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 creating a ClientBootstrap
2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 connectivity_state=connecting activating connection
2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 TLS handshake completed, negotiated protocol: h2
2023-10-07T09:12:36-0500 debug grpc_bg : tls_version=tlsv13 alpn=h2 grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc.conn.addr_remote=3.21.141.90 grpc.conn.addr_local=192.168.1.41 TLS handshake completed
2023-10-07T09:12:36-0500 debug grpc_bg : h2_settings_initial_window_size=65536 h2_settings_max_concurrent_streams=128 grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 poolmanager.id=ObjectIdentifier(0x0000000281f27a00) h2_settings_max_frame_size=16777215 grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.21.141.90 HTTP2 settings update
2023-10-07T09:12:36-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 connectivity_state=active connection ready
2023-10-07T09:12:36-0500 info ConnectionStatus : ConnectivityDelegate: connectSucceeded

Here's the end of the log, starting just before things go down:

2023-10-07T09:13:43-0500 trace grpc_call : grpc_request_id=B0444DC0-1AB1-4DCE-BC06-C20E92726A99 h2_end_stream=false h2_payload=HEADERS h2_headers=[([], ":status", "200"), ([], "date", "Sat, 07 Oct 2023 14:13:43 GMT"), ([], "content-type", "application/grpc"), ([non-indexable], "server", "Kestrel")] received HTTP2 frame
2023-10-07T09:13:43-0500 trace grpc_call : grpc_request_id=B0444DC0-1AB1-4DCE-BC06-C20E92726A99 h2_payload=DATA h2_data_bytes=5 h2_end_stream=false received HTTP2 frame
2023-10-07T09:13:43-0500 trace grpc_call : grpc_request_id=B0444DC0-1AB1-4DCE-BC06-C20E92726A99 h2_end_stream=true h2_payload=HEADERS h2_headers=[([non-indexable], "grpc-status", "0")] received HTTP2 frame
2023-10-07T09:13:43-0500 trace grpc_call : grpc_request_id=ED06C8BC-FC5C-4607-B4F2-4F8459D62984 h2_payload=DATA h2_data_bytes=37 h2_end_stream=false received HTTP2 frame
2023-10-07T09:13:43-0500 info game 23 : NOTIFY GAME DATA: len = 15  offset = 86
2023-10-07T09:13:43-0500 info game 23 : await it.next()
2023-10-07T09:13:48-0500 info app : willResignActiveNotification
2023-10-07T09:13:49-0500 info app : didEnterBackgroundNotification
2023-10-07T09:14:38-0500 info app : applicationWillEnterForeground
2023-10-07T09:14:39-0500 info app : applicationDidBecomeActive
2023-10-07T09:14:51-0500 info app : willResignActiveNotification
2023-10-07T09:14:52-0500 info app : didEnterBackgroundNotification
2023-10-07T09:17:58-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 connectivity_state=ready deactivating connection
2023-10-07T09:17:58-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000281f27a00) grpc_connection_id=52CE6F6E-4139-45FE-99C6-0F351AAF6E5C/24 delay=0 scheduling connection attempt
2023-10-07T09:17:58-0500 info ConnectionStatus : ConnectivityDelegate: connectionClosed
2023-10-07T09:17:58-0500 trace grpc_call : grpc_request_id=2A9F1BB5-815D-4A64-A72E-98F5A47B6C32 grpc.conn.addr_local=192.168.1.41 call_state=closed grpc.conn.addr_remote=3.21.141.90 failing buffered writes
2023-10-07T09:17:58-0500 error tsk_credits : CATCH GRPCStatus : unavailable (14): Transport became inactive
2023-10-07T09:17:58-0500 trace grpc_call : grpc_request_id=11D1EE66-0196-4054-9B50-BE5FBACEADC7 grpc.conn.addr_local=192.168.1.41 call_state=closed grpc.conn.addr_remote=3.21.141.90 failing buffered writes
2023-10-07T09:17:58-0500 error PlayerModel 4 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T09:17:58-0500 info PlayerModel 4 : top of loop
2023-10-07T09:17:58-0500 error game 23 : CATCH GRPCStatus unavailable (14): Transport became inactive
2023-10-07T09:17:58-0500 info game 23 : top of loop
2023-10-07T09:17:58-0500 trace grpc_call : grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.21.141.90 call_state=closed grpc_request_id=ED06C8BC-FC5C-4607-B4F2-4F8459D62984 failing buffered writes
2023-10-07T09:17:58-0500 trace grpc_call : grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.21.141.90 call_state=closed grpc_request_id=EF347047-5B65-4502-8830-700553E17223 failing buffered writes
2023-10-07T09:17:58-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T09:17:58-0500 info tsk_active_games : top of loop
2023-10-07T09:17:58-0500 error PlayerModel 1027 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T09:17:58-0500 info PlayerModel 1027 : top of loop
2023-10-07T09:17:58-0500 trace grpc_call : grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.21.141.90 call_state=closed grpc_request_id=756FEB0B-FF12-4885-A831-7BD991C434B6 failing buffered writes
2023-10-07T09:17:58-0500 info app : applicationWillEnterForeground
2023-10-07T09:17:58-0500 info game 23 : await it.next()
2023-10-07T09:17:59-0500 info app : applicationDidBecomeActive
2023-10-07T09:18:13-0500 info app : willResignActiveNotification
2023-10-07T09:18:13-0500 info app : didEnterBackgroundNotification
2023-10-07T10:51:15-0500 info app : applicationWillEnterForeground
2023-10-07T10:51:16-0500 info app : applicationDidBecomeActive
2023-10-07T10:51:26-0500 info MyPlayerList : calling server to list players, kind = suggested
2023-10-07T10:51:26-0500 info MyPlayerList : calling server to list players, kind = previous
2023-10-07T10:51:59-0500 info MyPlayerList : calling server to list players, kind = suggested
2023-10-07T10:51:59-0500 info MyPlayerList : calling server to list players, kind = previous
2023-10-07T10:52:10-0500 info MyPlayerList : calling server to list players, kind = bots
2023-10-07T10:53:17-0500 info app : willResignActiveNotification
2023-10-07T10:53:17-0500 info app : didEnterBackgroundNotification
2023-10-07T10:54:28-0500 info app : applicationWillEnterForeground
2023-10-07T10:54:29-0500 info app : applicationDidBecomeActive
2023-10-07T10:54:36-0500 info MyPlayerList : calling server to list players, kind = suggested
2023-10-07T10:54:36-0500 info MyPlayerList : calling server to list players, kind = previous
2023-10-07T10:56:35-0500 info app : willResignActiveNotification
2023-10-07T10:56:35-0500 info app : didEnterBackgroundNotification
2023-10-07T11:02:24-0500 info app : applicationWillEnterForeground
2023-10-07T11:02:24-0500 info app : applicationDidBecomeActive
2023-10-07T11:02:30-0500 info app : willResignActiveNotification
2023-10-07T11:02:30-0500 info app : didEnterBackgroundNotification

We can see around 09:13 that things are healthy, and some info arrived from the server.

When the app is brought to the foreground at 09:17:58, we see the same basic problem described previous in this thread. There is a "scheduling connection attempt", but no evidence that the attempt every happened, and no further communication from the grpc background activity logger.

I kept the app around for a while before killing it. The messages at 10:51:26 involve several attempted gRPC calls. Those calls reach the point of await, but it's interesting that there are no grpc logger messages at all, not even from the grpc_call logger.

We've made some small improvements to the logging since this build was made, including log messages for the pings and pongs, as well as more complete information from log messages in the connectivity delegate.

FWIW, here's another log showing the problem, including the small logging improvements I mentioned in the previous comment.

The top section of the log shows things getting set up.

Then there's a section where I brought the app into the foreground at 22:33:51 last night. This one shows successfully re-establishing the connection and beginning to make calls on it.

Finally, there's a section where I brought the app into the foreground at 08:21:33 this morning. This one shows the failure case that is the topic of this issue.

2023-10-07T22:20:06-0500 info app : UIDevice.name: iPad
2023-10-07T22:20:06-0500 info app : UIDevice.systemName: iPadOS
2023-10-07T22:20:06-0500 info app : UIDevice.systemVersion: 17.0.3
2023-10-07T22:20:06-0500 info app : UIDevice.model: iPad
2023-10-07T22:20:06-0500 info app : UIDevice.localizedModel: iPad
2023-10-07T22:20:06-0500 info app : UIDevice.userInterfaceIdiom: UIUserInterfaceIdiom(rawValue: 1)
2023-10-07T22:20:06-0500 info app : UIDevice.batteryLevel: -1.0
2023-10-07T22:20:06-0500 info app : CFBundleShortVersionString: 1.0
2023-10-07T22:20:06-0500 info app : CFBundleVersion: 48
2023-10-07T22:20:06-0500 info MyClient : keepalive: ClientConnectionKeepalive(interval: NIOCore.TimeAmount(nanoseconds: 30000000000), timeout: NIOCore.TimeAmount(nanoseconds: 20000000000), permitWithoutCalls: false, maximumPingsWithoutData: 2, minimumSentPingIntervalWithoutData: NIOCore.TimeAmount(nanoseconds: 300000000000))
2023-10-07T22:20:06-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) poolmanager.pools.conns_per_pool=1 poolmanager.pools.waiters_per_pool=100 poolmanager.pools.count=1 initializing connection pool manager
2023-10-07T22:20:06-0500 info ConnectionStatus : ConnectivityDelegate: connectionAdded
2023-10-07T22:20:06-0500 info AppDelegate : didFinishLaunchingWithOptions
2023-10-07T22:20:07-0500 trace grpc_bg : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:20:07-0500 debug grpc_bg : pool.reservations.count=1 pool.reservations.capacity=0 pool.reservations.load=inf pool.reservations.loadThreshold=0.9 poolmanager.id=ObjectIdentifier(0x0000000282665780) stream reservation load factor greater than or equal to threshold, bringing up additional connection if available
2023-10-07T22:20:07-0500 info ConnectionStatus : ConnectivityDelegate: startedConnecting
2023-10-07T22:20:07-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 making client bootstrap with event loop group of type NIOTSEventLoop
2023-10-07T22:20:07-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2023-10-07T22:20:07-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 connectivity_state=connecting activating connection
2023-10-07T22:20:07-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 tls_version=tlsv13 alpn=h2 TLS handshake completed
2023-10-07T22:20:07-0500 debug grpc_bg : h2_settings_max_frame_size=16777215 grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 h2_settings_max_concurrent_streams=128 poolmanager.id=ObjectIdentifier(0x0000000282665780) h2_settings_initial_window_size=65536 grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 HTTP2 settings update
2023-10-07T22:20:07-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 connectivity_state=active connection ready
2023-10-07T22:20:07-0500 info ConnectionStatus : ConnectivityDelegate: connectSucceeded: streamCapacity = 128

...

2023-10-07T22:20:09-0500 trace grpc_call : h2_payload=HEADERS grpc_request_id=A07F2C4B-FE1B-4B01-9C40-2E3C4430A32A h2_headers=[([], ":status", "200"), ([], "date", "Sun, 08 Oct 2023 03:20:09 GMT"), ([], "content-type", "application/grpc"), ([non-indexable], "server", "Kestrel")] h2_end_stream=false received HTTP2 frame
2023-10-07T22:20:09-0500 trace grpc_call : grpc_request_id=A07F2C4B-FE1B-4B01-9C40-2E3C4430A32A h2_payload=DATA h2_data_bytes=83 h2_end_stream=false received HTTP2 frame
2023-10-07T22:20:09-0500 trace grpc_call : h2_headers=[([non-indexable], "grpc-status", "0")] h2_payload=HEADERS grpc_request_id=A07F2C4B-FE1B-4B01-9C40-2E3C4430A32A h2_end_stream=true received HTTP2 frame
2023-10-07T22:20:09-0500 info tsk_completed_games : await it.next()
2023-10-07T22:20:14-0500 info app : willResignActiveNotification
2023-10-07T22:20:15-0500 info app : didEnterBackgroundNotification
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 ping
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 connectivity_state=ready deactivating connection
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/0 delay=0 scheduling connection attempt
2023-10-07T22:33:51-0500 info ConnectionStatus : ConnectivityDelegate: connectionClosed: POSIXErrorCode(rawValue: 53): Software caused connection abort
2023-10-07T22:33:51-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info tsk_active_games : top of loop
2023-10-07T22:33:51-0500 info game 5 : GOT CANCEL IN GAME DATA
2023-10-07T22:33:51-0500 trace grpc_call : call_state=closed grpc_request_id=03185C95-9B41-4DC1-B293-D2EDABBC9830 grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-07T22:33:51-0500 error game 5 : CATCH GRPCStatus unavailable (14)
2023-10-07T22:33:51-0500 info game 5 : top of loop
2023-10-07T22:33:51-0500 info game 23 : GOT CANCEL IN GAME DATA
2023-10-07T22:33:51-0500 error game 23 : CATCH GRPCStatus unavailable (14)
2023-10-07T22:33:51-0500 info game 23 : top of loop
2023-10-07T22:33:51-0500 trace grpc_call : call_state=closed grpc.conn.addr_local=192.168.1.41 grpc_request_id=5D528168-AC67-401F-B783-F88527576579 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-07T22:33:51-0500 error tsk_completed_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info tsk_completed_games : top of loop
2023-10-07T22:33:51-0500 error PlayerModel 1027 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info PlayerModel 1027 : top of loop
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=363B9D89-95B7-4DDB-B78B-00E2A6E362ED call_state=closed grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 failing buffered writes
2023-10-07T22:33:51-0500 error tsk_invites : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info tsk_invites : top of loop
2023-10-07T22:33:51-0500 info game 23 : GOT CANCEL IN GAME DATA
2023-10-07T22:33:51-0500 info tsk_completed_games : await it.next()
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=1FBFB7A9-CBC6-442C-9080-C700A545C12C grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 call_state=closed failing buffered writes
2023-10-07T22:33:51-0500 info game 5 : GOT CANCEL IN GAME DATA
2023-10-07T22:33:51-0500 error PlayerModel 1024 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=40CD0BA5-1F80-4DE8-97C4-E1E4C9409A4C grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 call_state=closed failing buffered writes
2023-10-07T22:33:51-0500 info PlayerModel 1024 : top of loop
2023-10-07T22:33:51-0500 error PlayerModel 3 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info PlayerModel 3 : top of loop
2023-10-07T22:33:51-0500 info game 23 : await it.next()
2023-10-07T22:33:51-0500 error game 23 : CATCH GRPCStatus cancelled (1)
2023-10-07T22:33:51-0500 info game 5 : await it.next()
2023-10-07T22:33:51-0500 error game 5 : CATCH GRPCStatus cancelled (1)
2023-10-07T22:33:51-0500 trace grpc_call : call_state=closed grpc_request_id=A7D2CE10-2E80-4E30-9EF7-87B438372D81 grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=C8F2477E-59B2-4949-8F97-4A9DA2CB4849 call_state=closed grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-07T22:33:51-0500 error tsk_credits : CATCH GRPCStatus : unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=40F89149-D770-424E-9B2C-5EDDB61454D5 call_state=closed grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 failing buffered writes
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=04FA94E0-C542-40BB-9440-E968C46BB720 call_state=closed grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=B6BF1561-6ED3-465E-A2F9-42BA4F052B29 call_state=closed grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 failing buffered writes
2023-10-07T22:33:51-0500 error PlayerModel 2 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info PlayerModel 2 : top of loop
2023-10-07T22:33:51-0500 error PlayerModel 1029 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-07T22:33:51-0500 info PlayerModel 1029 : top of loop
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=206B5B30-FA54-4C30-AB23-CC59F579F0CB grpc.conn.addr_local=192.168.1.41 grpc.conn.addr_remote=3.132.145.240 call_state=closed failing buffered writes
2023-10-07T22:33:51-0500 info ConnectionStatus : ConnectivityDelegate: startedConnecting
2023-10-07T22:33:51-0500 debug grpc_call : grpc_request_id=4FA1E872-FF75-4B45-96D4-311F8F88B450 pool.waiters.count=0 pool.waiter.id=ObjectIdentifier(0x000000028124c460) waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=1 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 debug grpc_call : grpc_request_id=C0690E44-DAB3-48EC-B67D-109DE644196D pool.waiters.count=1 pool.waiter.id=ObjectIdentifier(0x000000028105c4b0) waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=2 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=4FA1E872-FF75-4B45-96D4-311F8F88B450 request_part=metadata call_state=awaitingTransport (0 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=4FA1E872-FF75-4B45-96D4-311F8F88B450 request_part=message call_state=awaitingTransport (1 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=4FA1E872-FF75-4B45-96D4-311F8F88B450 request_part=end call_state=awaitingTransport (2 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=C0690E44-DAB3-48EC-B67D-109DE644196D request_part=metadata call_state=awaitingTransport (0 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=C0690E44-DAB3-48EC-B67D-109DE644196D request_part=message call_state=awaitingTransport (1 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=C0690E44-DAB3-48EC-B67D-109DE644196D request_part=end call_state=awaitingTransport (2 parts buffered) buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiter.id=ObjectIdentifier(0x000000028124d7c0) grpc_request_id=0175DA91-B19C-4E5C-8C03-46810C2760BC pool.waiters.count=2 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=3 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=closing grpc_request_id=C0690E44-DAB3-48EC-B67D-109DE644196D failing buffered writes
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiter.id=ObjectIdentifier(0x000000028124c550) grpc_request_id=7FF280F1-C940-469E-9F9F-8184D417A40D pool.waiters.count=3 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=4 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=7FF280F1-C940-469E-9F9F-8184D417A40D buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (1 parts buffered) request_part=message grpc_request_id=7FF280F1-C940-469E-9F9F-8184D417A40D buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (2 parts buffered) request_part=end grpc_request_id=7FF280F1-C940-469E-9F9F-8184D417A40D buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=0175DA91-B19C-4E5C-8C03-46810C2760BC buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (1 parts buffered) request_part=message grpc_request_id=0175DA91-B19C-4E5C-8C03-46810C2760BC buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (2 parts buffered) request_part=end grpc_request_id=0175DA91-B19C-4E5C-8C03-46810C2760BC buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiter.id=ObjectIdentifier(0x000000028124d220) grpc_request_id=EC6CE161-1373-4609-AA85-868247AAA3F7 pool.waiters.count=4 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=5 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=EC6CE161-1373-4609-AA85-868247AAA3F7 buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (1 parts buffered) request_part=message grpc_request_id=EC6CE161-1373-4609-AA85-868247AAA3F7 buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (2 parts buffered) request_part=end grpc_request_id=EC6CE161-1373-4609-AA85-868247AAA3F7 buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiter.id=ObjectIdentifier(0x000000028124f660) grpc_request_id=BD1B6640-DA97-477C-BAEA-DFE995042BC0 pool.waiters.count=5 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=6 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=BD1B6640-DA97-477C-BAEA-DFE995042BC0 buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=BD1B6640-DA97-477C-BAEA-DFE995042BC0 request_part=message call_state=awaitingTransport (1 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=BD1B6640-DA97-477C-BAEA-DFE995042BC0 request_part=end call_state=awaitingTransport (2 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=0175DA91-B19C-4E5C-8C03-46810C2760BC call_state=closing failing buffered writes
2023-10-07T22:33:51-0500 debug grpc_call : grpc_request_id=341E1CD1-69EE-4C12-A3AE-660271FFF86A pool.waiters.count=6 pool.waiter.id=ObjectIdentifier(0x000000028105dc70) waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : pool.waiters.count=7 poolmanager.id=ObjectIdentifier(0x0000000282665780) enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=341E1CD1-69EE-4C12-A3AE-660271FFF86A request_part=metadata call_state=awaitingTransport (0 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=341E1CD1-69EE-4C12-A3AE-660271FFF86A request_part=message call_state=awaitingTransport (1 parts buffered) buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=341E1CD1-69EE-4C12-A3AE-660271FFF86A request_part=end call_state=awaitingTransport (2 parts buffered) buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : grpc_request_id=883ED334-C940-4F29-A637-D5A135382640 pool.waiters.count=7 pool.waiter.id=ObjectIdentifier(0x000000028105ceb0) waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) pool.waiters.count=8 enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=883ED334-C940-4F29-A637-D5A135382640 buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (1 parts buffered) request_part=message grpc_request_id=883ED334-C940-4F29-A637-D5A135382640 buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (2 parts buffered) request_part=end grpc_request_id=883ED334-C940-4F29-A637-D5A135382640 buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiters.count=8 pool.waiter.id=ObjectIdentifier(0x000000028124e670) grpc_request_id=52CAC6E0-FB4B-449F-AB57-22F9B226420D waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) pool.waiters.count=9 enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (0 parts buffered) request_part=metadata grpc_request_id=52CAC6E0-FB4B-449F-AB57-22F9B226420D buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (1 parts buffered) request_part=message grpc_request_id=52CAC6E0-FB4B-449F-AB57-22F9B226420D buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : call_state=awaitingTransport (2 parts buffered) request_part=end grpc_request_id=52CAC6E0-FB4B-449F-AB57-22F9B226420D buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : pool.waiters.count=9 pool.waiter.id=ObjectIdentifier(0x000000028124f070) grpc_request_id=FA38AF5C-3B10-4DB3-B33E-5AF6897D3AF7 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) pool.waiters.count=10 enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=FA38AF5C-3B10-4DB3-B33E-5AF6897D3AF7 call_state=awaitingTransport (0 parts buffered) request_part=metadata buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=FA38AF5C-3B10-4DB3-B33E-5AF6897D3AF7 call_state=awaitingTransport (1 parts buffered) request_part=message buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=FA38AF5C-3B10-4DB3-B33E-5AF6897D3AF7 call_state=awaitingTransport (2 parts buffered) request_part=end buffering request part
2023-10-07T22:33:51-0500 debug grpc_call : grpc_request_id=CA81BD60-3755-4D41-A8B7-70F63FC98B72 pool.waiter.id=ObjectIdentifier(0x000000028105dc20) pool.waiters.count=10 waiting for a connection to become available
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) pool.waiters.count=11 enqueued connection waiter
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=CA81BD60-3755-4D41-A8B7-70F63FC98B72 call_state=awaitingTransport (0 parts buffered) request_part=metadata buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=CA81BD60-3755-4D41-A8B7-70F63FC98B72 call_state=awaitingTransport (1 parts buffered) request_part=message buffering request part
2023-10-07T22:33:51-0500 trace grpc_call : grpc_request_id=CA81BD60-3755-4D41-A8B7-70F63FC98B72 call_state=awaitingTransport (2 parts buffered) request_part=end buffering request part
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 making client bootstrap with event loop group of type NIOTSEventLoop
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2023-10-07T22:33:51-0500 info app : applicationWillEnterForeground
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 connectivity_state=connecting activating connection
2023-10-07T22:33:51-0500 debug grpc_bg : grpc.conn.addr_remote=3.21.141.90 alpn=h2 grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 grpc.conn.addr_local=192.168.1.41 tls_version=tlsv13 poolmanager.id=ObjectIdentifier(0x0000000282665780) TLS handshake completed
2023-10-07T22:33:51-0500 debug grpc_bg : grpc.conn.addr_local=192.168.1.41 grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 h2_settings_max_concurrent_streams=128 h2_settings_initial_window_size=65536 grpc.conn.addr_remote=3.21.141.90 poolmanager.id=ObjectIdentifier(0x0000000282665780) h2_settings_max_frame_size=16777215 HTTP2 settings update
2023-10-07T22:33:51-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/1 connectivity_state=active connection ready
2023-10-07T22:33:51-0500 info ConnectionStatus : ConnectivityDelegate: connectSucceeded: streamCapacity = 128
2023-10-07T22:33:51-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) pool.waiters.count=11 servicing waiters

...

2023-10-07T23:09:48-0500 trace grpc_call : h2_headers=[([non-indexable], "grpc-status", "0")] grpc_request_id=DAB2A1CD-5C3A-4C89-BCA3-308DA898A501 h2_end_stream=true h2_payload=HEADERS received HTTP2 frame
2023-10-07T23:09:49-0500 info app : willResignActiveNotification
2023-10-07T23:09:49-0500 info app : didEnterBackgroundNotification
2023-10-08T08:21:33-0500 trace grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/4 ping
2023-10-08T08:21:33-0500 info app : applicationWillEnterForeground
2023-10-08T08:21:33-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/4 connectivity_state=ready deactivating connection
2023-10-08T08:21:33-0500 debug grpc_bg : poolmanager.id=ObjectIdentifier(0x0000000282665780) grpc_connection_id=A5AA4B66-FB5C-4BE2-9090-C520A0F139B8/4 delay=0 scheduling connection attempt
2023-10-08T08:21:33-0500 info ConnectionStatus : ConnectivityDelegate: connectionClosed: POSIXErrorCode(rawValue: 53): Software caused connection abort
2023-10-08T08:21:33-0500 error PlayerModel 3 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info PlayerModel 3 : top of loop
2023-10-08T08:21:33-0500 trace grpc_call : grpc_request_id=58F9A83A-9273-4C12-8E2F-2B700808228B grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 call_state=closed failing buffered writes
2023-10-08T08:21:33-0500 error PlayerModel 2 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info PlayerModel 2 : top of loop
2023-10-08T08:21:33-0500 trace grpc_call : grpc_request_id=5325AF23-C577-43E3-903D-15FE07DE1915 grpc.conn.addr_remote=3.132.145.240 grpc.conn.addr_local=192.168.1.41 call_state=closed failing buffered writes
2023-10-08T08:21:33-0500 error PlayerModel 1024 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info PlayerModel 1024 : top of loop
2023-10-08T08:21:33-0500 trace grpc_call : grpc.conn.addr_local=192.168.1.41 grpc_request_id=A1E153A4-B2AC-4693-965C-A40622773260 call_state=closed grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-08T08:21:33-0500 error PlayerModel 1027 : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info PlayerModel 1027 : top of loop
2023-10-08T08:21:33-0500 trace grpc_call : grpc.conn.addr_remote=3.132.145.240 grpc_request_id=5A93C018-E340-4459-B6B3-1C829869EA19 call_state=closed grpc.conn.addr_local=192.168.1.41 failing buffered writes
2023-10-08T08:21:33-0500 trace grpc_call : grpc_request_id=463C652D-F619-4CE7-AED6-EC69D0F6A7A8 grpc.conn.addr_local=192.168.1.41 call_state=closed grpc.conn.addr_remote=3.132.145.240 failing buffered writes
2023-10-08T08:21:33-0500 error tsk_invites : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info tsk_invites : top of loop
2023-10-08T08:21:33-0500 trace grpc_call : grpc.conn.addr_remote=3.132.145.240 grpc_request_id=1D75D41B-E320-4DDD-96E7-2AF7593A1856 call_state=closed grpc.conn.addr_local=192.168.1.41 failing buffered writes
2023-10-08T08:21:33-0500 error tsk_active_games : CATCH GRPCStatus: unavailable (14): Transport became inactive
2023-10-08T08:21:33-0500 info tsk_active_games : top of loop
2023-10-08T08:21:34-0500 info app : applicationDidBecomeActive
2023-10-08T08:21:51-0500 info MyPlayerList : calling server to list players, kind = previous
2023-10-08T08:21:51-0500 info MyPlayerList : calling server to list players, kind = suggested
2023-10-08T08:22:28-0500 info app : willResignActiveNotification
2023-10-08T08:22:29-0500 info app : didEnterBackgroundNotification

Thanks for providing all of that info, Eric. Unfortunately this doesn't really tell us any more. I think the next step would be looking at a sysdiagnose. Would you be able to capture and share one when using the Network.framework implementation (along with the logs from grpc-swift)? You can find instructions on how to capture one here: https://developer.apple.com/bug-reporting/profiles-and-logs/?name=sysdiagnose

While we figure this out, a workaround is to close the connections when when the app is backgrounded and recreate them when it enters the foreground again.

I am currently following a possible lead. Unfortunately, each time I make a code change, the next step is "wait for the problem to happen or not". The waiting period is sometimes hours, but I've seen it take 4 days.

In a nutshell, the "possible lead" is: I found a place in the code where, in a SwiftUI view init, I was starting a Task that makes a gRPC call. In other words, I was doing something that is:

  1. Very Bad
  2. Kind-of Related

It is not clear exactly how this bad code would cause the symptoms I have described, but it is credible that it might, especially because we don't have any better ideas. Because of the way SwiftUI initializes views, the perceived effect by grpc-swift might be something like a grpc stream that was opened and then forever ignored, but perhaps not properly cleaned up, or perhaps cleaned up with unpredictable timing. Etc. This is all conjecture. I need to go back to the log files where the problem appeared and look in the "other" areas of the log.

Also, this lead fits my general expectation that "Eric doing something wrong" is the most likely cause of this problem, because I seem to be the only one seeing it.

I deployed a fix for that SwiftUI view init on Monday, and the problem has not been seen in 3 days. That is not conclusive, but I'm starting to allow myself a bit of optimism.

If in fact the problem never returns, it is possible that I have a fix but you do not. In other words, even if my code did something stupid, it MAY be the case that you want your code to handle it differently.

Anyway, I'll follow up again in a few more days. If the problem doesn't show up again, I'll provide as much detail as I can to help figure out how or if to go forward.

OK, over 7 days have passed since the problem has happened. It seems likely that the code fix referenced in the previous comment has made the issue go away.

Today I looked again at the logs from before and focused on the areas where the offending SwiftUI view would have been initialized. I hoped to see some useful info there, but I did not.

Should there be any further activity on this issue? I'll leave that decision up to you (the maintainers).

A sufficiently pedantic person might claim that grpc-swift should have handled this better and we need to dive in and fully understand where things went wrong. I detest mysteries in software, and I hate it when a problem just "goes away", so I have some sympathy to this perspective.

On the other hand, putting myself in your shoes, you've got a possible bug that (1) happened on code you haven't seen, and (2) cannot be easily reproduced, and (3) isn't really clear at all, and (4) was apparently triggered by a fairly bad piece of code, and (5) has apparently never been seen by anyone else, and (6) is no longer an issue for the person who reported it. That seems like approximately the dictionary definition of a low priority issue.

So, if you do want to investigate this further, I am willing to help. But I am explicitly not expressing any expectation that you should do so.

Thanks for your attention. If in fact you want to pause or close activity on this issue, I offer my apologies for what now seems somewhat like a waste of your time.

If you were somehow holding it wrong then grpc-swift should, in an ideal world, be able to indicate this to you in some manner (e.g. via a fatal error in the worst case). So I consider it a defect that you can hold grpc-swift in such a way that causes your application to get stuck. That said, without more information there's very little we can do to work out what's going on here and I'm hesitant to ask you to spend more time gathering information because it's not clear to me we're even looking in the right place.

Thanks for your attention. If in fact you want to pause or close activity on this issue, I offer my apologies for what now seems somewhat like a waste of your time.

You're welcome and I absolutely don't consider it a waste of our time, it's always helpful to surface rough edges and areas we can improve. I also really appreciate that you've spent a bunch of time trying to debug this and have provided us with plenty of info along the way, thank you!