Yvand/EntraCP

AzureCP Stops working after 31 Days.

Closed this issue · 10 comments

Hi @Yvand,

We are experiencing a weird issue with a single customer.
After pretty much exactly one month AzureCP stops working, we assume it's the Token which does not get renewed, but we lack the errors to show. The only way we are aware of working around this issue is by restarting the system.

The following error is the one we could find which does not show on other customers systems.

[AzureCP] Unexpected error(s) occurred in AugmentEntity: [EXCEPTION 1]: System.Net.Http.HttpRequestException: An error occurred while sending the request.. Callstack:
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.RedirectHandler.d__6.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.RetryHandler.d__9.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.CompressionHandler.d__2.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.AuthenticationHandler.d__16.MoveNext() --- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Graph.HttpProvider.d__19.MoveNext()

Do you have any idea on what could be the reason for this issue and if there is any solution we could attempt to fix it?

Many thanks in advance

Yvand commented

@Kinumikao once you restart the system, how long does it work before it throws this error again?
What version of AzureCP are you using?
Could you use Fiddler as a proxy to inspect the traffic between AzureCP and Azure AD when the error occurs?

@Yvand, thanks for the quick reply.
The Version we use is up-to-date: 19.0.20210211.1285.
AzureCP works for almost exactly 31 Days until the Issue reoccurs again. The issue has been occurring since around December of last year.
We would like to avoid using Fiddler on the customer system but since we can't replicate the Issue on our Test Environments, we could as a last resort arrange something with the customer.

Yvand commented

How many tenants are configured?
Does the people picker continue to work when this error happens?
Does this exception happen only in the SharePoint STS process?

@Yvand, this system is only for a single customer. We have multiple systems, each for its customer.
As much as I'm aware, PeoplePicker does still resolve user already inside the UIL. Users who gain access through AAD Groups are unable to sign in.
I'm not sure where this exception happens exactly, how could I check that?

Yvand commented

@Kinumikao SharePoint log always records the process ID along with the process name.
You can run command cmd /r %systemroot%\system32\inetsrv\appcmd list wp to get the exact w3wp.exe process running under this PID
Since users members of AAD groups cannot sign-in when this happens, it's most likely the SP STS process.

One more thing, IIS should restart the w3wp processes every day, so I don't see what, client side, could cause the issue to happen exactly every 31 days... It cannot be the access token as it is valid for 1h, and reacquired each time the process restarts (so at least once every day)

@Yvand, In the logs for that day when the Issue occurred, we have two logs listing Process: "App Pool: SecurityTokenServiceApplicationPool". The rest of the AzureCP Log state w3wp.exe which does show up in the results of your command.

Interesting, good to know.
I could not find anything else which happens in the interval of one month. Could it technically be possible that this happens on the side of the Customer (AzureAD App registration)?

Yvand commented

@Kinumikao I cannot think about anything that can cause such issue to happen every 31 days, client side or server side...
Especially if it affects only the STS, and other w3wp processes (which use the same code to get the same access token) work fine...

@Yvand I understand, is there a collection of known issues regarding AzureCP? We are going to continue monitoring this issue, but for the Time being, thanks allot for your help.

Yvand commented

There is not such list of known issues, sorry
Keep me posted on your findings

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.