Yvand/EntraCP

Azure CP - Failing to claim access token reliably

Closed this issue · 11 comments

Hi Yvand, been using Azure CP on a pair of our farms and in the past month or so have started encountering the following error in our ULS logs with the people picker becoming unreliable when searching for Azure users;

"[AzureCP] Could not get access token before timeout of 15000ms for tenant 'example.onmicrosoft.com' on cloud instance 'ExampleInstance'"

This repeats about 7 or 8 times at which point it would then claim a token and work without issue for an hour. The same then repeats every time the token is refreshed. I updated a Pre-Prod instance to 19.0.20210211.1285 to take advantage of the advised improvements to token handling.

Initial Token Claim after Update:
The issue has persisted post upgrade on this Pre-Prod instance with the same behaviour and initial error. With some additional Graph related errors, we didn’t get these on the previous version.

Before Claiming the Token, one instance of:
"[AzureCP] Unexpected error occurred in Microsoft.Graph could not query tenant 'example.onmicrosoft.com': Microsoft.Graph.ServiceException: Code: InvalidAuthenticationToken Message: Access token is empty. Inner Error: AdditionalData: date: 2021…"

After Claiming the Token, Four instances of:

“[AzureCP] Unexpected error occurred Microsoft.Graph could not query tenant 'example.onmicrosoft.com': System.Threading.Tasks.TaskCanceledException: A task was canceled., Callstack:
at System.Runtime.CompilerService.Task.Awaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerService.Task.Awaiter.HandleNonSuccessAndDebuggerNotification (Task task)
at Microsoft.Graph.HttpProvider.<SendRequestAsync.d__19.MoveNext

People Picker will fail to return results saying it can’t reach the server until a search is done after the token claim.

When next claiming a token after an hour

The token takes 7 attempts once again to be claimed and new errors are presented by MicosoftGraph. These stop on their own if a new search is attempted.

Six Instances of after token claim:

[AzureCP] Unexpected error occurred Microsoft.Graph could not query tenant ‘example.onmicrosoft.com’: System.FormatException Cannot add value because header ‘Authorization’ does not support multiple values., Callstack:
at System.Net.Http.Headers.HttpHeaders.ParseAndAddValue(String name, HeaderStoreItemInfo info, String value)
at System.Net.Http.Headers.HttpHeaders.Add(String name, String value)
at azurecp.AADAppOnlyAuthenticationProvider.d__13.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Microsoft.Graph.AuthenticationHandler.d__15.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.AuthenticationHandler.d__16.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.HttpProvider.d__19.MoveNext()

Any insight or assistance that can be provided for this would be greatly appreciated.

Yvand commented

@Charlie-Gunn so it looks like that, for some reason, AzureCP can no longer connect to Azure AD.
Did you check https://yvand.github.io/AzureCP/Troubleshoot-AzureCP.html ?
It contains multiple scripts to test the connectivity to Azure.

@Yvand We just re-ran the Invoke-WebRequests as an App Pool account as well as previous attempts from administration accounts and all of them returned results without issue and virtually immediately

The second script for access tokens is virtually identical to a script we were provided for testing by our Developer. When we ran this during earlier testing no issues were identified here either.

However despite this success in PowerShell we are not seeing the same results from Azure CP.

Yvand commented

There is 1 more thing which you can do:
You can redirect the traffic of AzureCP through Fiddler to inspect the actual traffic between AzureCP and AAD.
If you installed Fiddler locally and trusted its root certificate, you can do it per-web application by updating the web.config:

<system.net>
    <defaultProxy useDefaultCredentials="True">
        <proxy proxyaddress="http://localhost:8888" bypassonlocal="False" />
    </defaultProxy>
</system.net>

We have not had a chance to get around to using fiddler but we noticed the Web App related to the App Pool we used for testing had stop exhibiting issues with claiming tokens. However another Web App in the same environment still had the problem. Waiting to confirm results on some testing but logging into the App Pool account may clear this issue for us.

@Yvand To update the above on the instance with 19.0.20210211.1285 we have been able to clear the connectivity issues on a per web app basis by remoting into the server with their unique AppPool account. Sadly we are unsure on what exactly this is changing but its clearly having a positive effect for us.

Yvand commented

@Charlie-Gunn do you mean that the issue was gone after opening a RDP session as the app pool account?

@Yvand Correct, after successfully RDPing to the Machine and logging in with the App Pool account, it successfully claims the token first time every time.

Yvand commented

@Charlie-Gunn it makes me immediately think about this problem.
You can take a look at the event viewer to check if you see such error, and I suggest you test the resolution in method 2.

@Yvand Sorry about the delay in replying. We had almost the opposite to the issue in the link provided where instead the interactive login was solving the problem rather than creating one.

We believe one of the registry entries added in this case is resolving our connectivity issue but were unable to identify which one exactly made the difference.

Yvand commented

@Charlie-Gunn yes it's part of the symptom: When you sign-in to the server it temporarily resolves the issue.
So this is very likely the actual issue and the solution suggested should work

@Yvand I keep this in mind gong forward in the event the issue returns. So far we have had continued availability on the platforms this was completed on and I have been asked to avoid further changes on this matter for now. Thank you for your help on this!