aws-amplify/amplify-swift

fetchAuthSession fails after device wakes

kparichan opened this issue · 18 comments

Describe the bug

In our analytic reports we see fetchAuthSession failing after a device wakes up, and has been asleep for some time.
Additional info provided the "Additional context" section below.

This is not a new issue for us. It's been showing up intermittently for the last several months, across various versions of amplify-swift.

Steps To Reproduce

There's no clear way to reproduce this. Just sleeping the device and waking it later with the app running does not reproduce this issue. It's something that we almost always notice in the field.

Expected behavior

Network connectivity is available, so fetchAuthSession should return a valid auth token or flag the user as not being signed in.

Amplify Framework Version

2.35.4

Amplify Categories

Auth

Dependency manager

Swift PM

Swift version

5.10

CLI version

12.1.0

Xcode version

15.4 (15F31d)

Relevant log output

No logs are available.

Is this a regression?

No

Regression additional context

No response

Platforms

iOS

OS Version

No specific OS version

Device

No specific device

Specific to simulators

No response

Additional context

Information originally included in response to issue #3540.

We're also seeing behavior in our app which seems to line up with this issues. I don't have full logs unfortunately, only what I've been able to piece together via analytics.

We have the same testers on iOS and Android, and the code is the same across platforms, but we've only ever seen this issue on iOS.

The issue appears to happen when calling fetchAuthSession right after the device wakes up, and has been asleep for some time.

While that call fails attempting to fetch/refresh the auth token, and with Amplify.Hub.subscribe setup, AuthChannelEventName.SESSION_EXPIRED does not appear to be triggered.

The failure to fetchAuthSession reports:
localizedDescription = "The operation couldn’t be completed. (Amplify.AuthError error 1.)"
print() = "network(AuthError: Unknown service error occurred Recovery suggestion: See the attached error for mo" (Truncated by analytics)

The only other info I have is a screenshot from a debug build which reports an error of "The network connection was lost." It was attempting to contact a url of "https://cognito-idp.us-west-2.amazonaws.com/".

I don't know how the internal networking is handled, but is the framework attempting to preflight network connectivity before actually making a request?

In general, that goes against the recommended behavior and could lead to an issue where connectivity checking fails because the device radios haven't been powered up yet right after wake.

This may also relate to another closed issue that my coworker pointed out:
#3437

Normally you'd just set waitsForConnectivity and a short timeoutIntervalForResource value for the URLSessionConfiguration and issue any calls without any preflight connectivity checking.

Apologies if this has been investigated previously.

Would you be able to share you code snippet that is being used upon device getting awaked. Additionally, could you also share the metrics that you are seeing, if there is any specific logs or errors that could help narrow down the issue.

Inside of a Task, it's basically just...

do {
    let authSession = try await Amplify.Auth.fetchAuthSession()
    // ... handle success
} catch {
    Debug.log("Fetch failed", error)
    // ... report failure
}

The error returned is what I mentioned in the context section above.

There are no logs or analytics that I can share. The error is logged via analytics, which is how I got those same results posted above.

And as I mentioned also, in this case, HubPayload.EventName.Auth.sessionExpired is NOT reported.

We also have this issue. However, I have not yet found out the exact cause and cannot reproduce it.

Thank you - it'd be great if you can share reproduction steps when you run into this issue again.

Today I had the problem again on one of our test devices.
After the app had been in the background overnight and I had then opened it.
Here are the logs from the console:

11:09:09.316133+0200 BeTidy Connection 9: received H2 RST_STREAM with error code: 1
11:09:09.316670+0200 BeTidy Task .<1> done using Connection 9
11:09:09.316787+0200 BeTidy [C9] event: client:connection_idle @60.558s
11:09:09.317085+0200 BeTidy Task .<1> can retry(N) with reason(2) for error [1:57]
11:09:09.317146+0200 BeTidy Task .<1> HTTP load failed, 247/0 bytes (error code: -1005 [1:57])
11:09:09.318141+0200 BeTidy Task .<1> summary for task failure {transaction_duration_ms=60559, response_status=200, connection=9, protocol="h2", domain_lookup_duration_ms=3, connect_duration_ms=513, secure_connection_duration_ms=120, private_relay=false, request_start_ms=522, request_duration_ms=0, response_start_ms=0, response_duration_ms=0, request_bytes=247, response_bytes=0, cache_hit=true}
11:09:09.318429+0200 BeTidy Task .<1> finished with error [-1005] Error Domain=NSURLErrorDomain Code=-1005 UserInfo={_kCFStreamErrorCodeKey=57, NSUnderlyingError=0x302a9c6c0 {Error Domain=kCFErrorDomainCFNetwork Code=-1005 UserInfo={NSErrorPeerAddressKey=, _kCFStreamErrorCodeKey=57, _kCFStreamErrorDomainKey=1}}, _NSURLErrorFailingURLSessionTaskErrorKey=, _NSURLErrorRelatedURLSessionTaskErrorKey=, NSLocalizedDescription=, NSErrorFailingURLStringKey=, NSErrorFailingURLKey=, _kCFStreamErrorDomainKey=1}
11:09:09.321650+0200 BeTidy Connection 9: cleaning up
11:09:09.321711+0200 BeTidy [C9 BC51CC96-4275-423D-88CB-A1CD50CE25D5 Hostname#11115440:443 quic-connection, url hash: 341d5f92, definite, attribution: developer] cancel
11:09:09.321880+0200 BeTidy [C9 BC51CC96-4275-423D-88CB-A1CD50CE25D5 Hostname#11115440:443 quic-connection, url hash: 341d5f92, definite, attribution: developer] cancelled
[C9.1.1.1 AD999B24-AF29-4143-8406-3F54F0743EEA 1111:11:1111:dd13:cdc6:a816:119c:1e14.49368<->IPv6#1111e982.443]
Connected Path: satisfied (Path is satisfied), viable, interface: pdp_ip0[endc_sub6], ipv4, ipv6, dns, expensive, uses cell
Privacy Stance: Not Eligible
Duration: 60.563s, DNS @0.001s took 0.003s, TCP @0.006s took 0.391s, TLS 1.3 took 0.120s
bytes in/out: 5794/984, packets in/out: 7/16, rtt: 0.060s, retransmitted bytes: 0, out-of-order bytes: 0
ecn packets sent/acked/marked/lost: 0/0/0/0
11:09:09.322395+0200 BeTidy nw_protocol_tcp_log_summary [C9.1.1.1:3]
[BA495FB6-CB13-438D-A8CF-972F51C5552A :49368<->:443]
Init: 1, Conn_Time: 391.349ms, SYNs: 4, WR_T: 0/0, RD_T: 0/0, TFO: 0/0/0, ECN: 0/0/0, Accurate ECN (client/server): Disabled/Disabled, TS: 1, TSO: 0
rtt_cache: process, rtt_upd: 4, rtt: 60.375ms, rtt_var: 37.437ms rtt_nc: 130.531ms, rtt_var_nc: 81.625ms base rtt: 23ms
ACKs-compressed: 0, ACKs delayed: 0 delayed ACKs sent: 0
11:09:09.322555+0200 BeTidy nw_flow_disconnected [C9 IPv6#1111e982.443 cancelled parent-flow ((null))] Output protocol disconnected
11:09:09.322695+0200 BeTidy Fetch identityId failed with error: Unexpected error occurred with message: Unknown error occurred
Recovery suggestion: This should not happen. There is a possibility that there is a bug if this error persists. Please take a look at https://github.com/aws-amplify/amplify-swift/issues to see if there are any existing issues that match your scenario, and file an issue with the details of the bug if there isn't.

Our code at this point looks like this:

do {
    let session = try await Amplify.Auth.fetchAuthSession()
    if let identityProvider = session as? AuthCognitoIdentityProvider {
        let identityIdResult = identityProvider.getIdentityId()
        switch identityIdResult {
            case .success(let identityId):
                // ... handle identityId              
            case .failure(let error):
		// ... here we get the error
                NSLog("Fetch identityId failed with error: \(error)")
                throw BTError.runtimeError(self._getErrorMessage(pError: error))
        }
    }
} catch {
    // ... handle error
}

One more piece of information.
We have added a retry dialog for this case, where we call Amplify.Auth.fetchAuthSession with forceRefresh and hoped that this would get the user out of the situation. But unfortunately that doesn't help either. The user has to kill the app completely and restart it to be able to use it normally again.

@tobias-feldmann Do you think you would be able to get verbose logs for the scenario that you faced? You can enable verbose logging to the console by doing this before calling Amplify.configure:

Amplify.Logging.logLevel = .verbose

@harsh62
I will give it a try. However, I can only very rarely reproduce the problem on our test devices. So it could take a while.