Azure/AppConfiguration-DotnetProvider

System.AggregateException, Three OperationCanceledException due to timeout on version 6.1.0

Closed this issue · 12 comments

I´m trying to reduce the number of error on our appinsights and i've noticed that the AppConfig library throws multiple errors due to network timeouts , I thought that this was resolved in this PR #440 and it was an expected behavior but after update to 6.1.0 version the error persists.

The app calls 3 times to Get .azconfig.io/kv/*** then throws an network error and is grouped in an aggregated error

¿Is this the expected behavior or i´m doing something wrong?

Trace of the exception:
Command : ConfigurationClient.GetConfigurationSetting

System.AggregateException: Retry failed after 3 tries. Retry settings can be adjusted in ClientOptions.Retry. (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.)
 ---> System.Threading.Tasks.TaskCanceledException: The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Azure.Core.Pipeline.HttpClientTransport.ProcessAsync(HttpMessage message, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineTransportPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RequestActivityPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   --- End of inner exception stack trace ---
   at Azure.Core.Pipeline.ResponseBodyPolicy.ThrowIfCancellationRequestedOrTimeout(CancellationToken originalToken, CancellationToken timeoutToken, Exception inner, TimeSpan timeout)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.<ProcessAsync>g__ProcessAsyncInner|4_0(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Data.AppConfiguration.AuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   --- End of inner exception stack trace ---
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Microsoft.Extensions.Configuration.AzureAppConfiguration.UserAgentHeaderPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.HttpPipeline.SendRequestAsync(Request request, CancellationToken cancellationToken)
   at Azure.Data.AppConfiguration.ConfigurationClient.GetConfigurationSettingAsync(String key, String label, Nullable`1 acceptDateTime, MatchConditions conditions, CancellationToken cancellationToken)
 ---> (Inner Exception #1) System.Threading.Tasks.TaskCanceledException: The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Azure.Core.Pipeline.HttpClientTransport.ProcessAsync(HttpMessage message, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineTransportPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RequestActivityPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   --- End of inner exception stack trace ---
   at Azure.Core.Pipeline.ResponseBodyPolicy.ThrowIfCancellationRequestedOrTimeout(CancellationToken originalToken, CancellationToken timeoutToken, Exception inner, TimeSpan timeout)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.<ProcessAsync>g__ProcessAsyncInner|4_0(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Data.AppConfiguration.AuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)<---

 ---> (Inner Exception #2) System.Threading.Tasks.TaskCanceledException: The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Azure.Core.Pipeline.HttpClientTransport.ProcessAsync(HttpMessage message, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineTransportPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RequestActivityPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   --- End of inner exception stack trace ---
   at Azure.Core.Pipeline.ResponseBodyPolicy.ThrowIfCancellationRequestedOrTimeout(CancellationToken originalToken, CancellationToken timeoutToken, Exception inner, TimeSpan timeout)
   at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.<ProcessAsync>g__ProcessAsyncInner|4_0(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Data.AppConfiguration.AuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
   at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)<---

I´ve read the source code and it seems that TaskCanceledException is not catch but OperationCanceledException is, ¿Should TaskCanceledException be catched too?

Hello @Ismael-Pep, are you getting this error during application startup or during refresh?
Also, have you configured your application to failover to any replicas as described here?

Hi! This happens during refresh, we don't have any replicas configured since we are operating from only one location.
Are network timeouts an expected behavior from Azure?

Its difficult to confirm the cause of network timeouts without some tracing information like your client request ID for failed requests. Network timeouts could also mean that the client application is unable to send requests to AppConfig service.
I also want to understand how you're initiating refresh operation. Are you invoking RefreshAsync or TryRefreshAsync?
If you're invoking TryRefreshAsync, these errors should not be crashing your application since we catch all these exceptions before returning true/false from TryRefreshAsync.

The application isn´t crashing but app insights is filling with these errors and making difficult to identify what are a real errors.
I don´t actually call directy TryRefreshAsync nor RefreshAsync . I call AddAzureAppConfiguration and UseAzureAppConfiguration. Here is an example of what i'm seeing in appinsights
tempsnip

I have a few follow up questions:

- Do you see this timeout only for AppConfig requests or have you noticed this for other network operations too?
This happens in other network operations, but not a this volume of errors and are not persistent on retries.
- Do all requests to AppConfig timeout, or only some intermittent timeouts?
Only some of the request timeouts, for example in the last 30 days around the 33% dependency calls to appconfig failed.
- Would it be possible for you to capture the client request ID?
Here are some:
4daaed2e-468a-4bed-b498-c80bd253f8f7
232e5baf-f327-4f9a-99f5-eb808e2d5679
034dd054-62f6-4f59-a707-2468da801af7
- Did you try setting up logging to see if there are other more specific errors to help narrow this down? Please see this document on how to set that up: https://learn.microsoft.com/en-us/azure/azure-app-configuration/enable-dynamic-configuration-aspnet-core?tabs=core6x#logging-and-monitoring
I´ll enable this and get back to you as soons as I can.

Sorry for the late response, after activating a more verbose settings i have a trace with this message "Refresh skipped because no endpoint is accessible.". It is been difficult to trace this issue since there is correlation between the dependency calls and the traces.
I'm guessing this are all from the same request:
tempsnip

Did you managed to get any data from the operation ids?

The log trace indicates that at 14:48:47, the application failed to get configuration settings from your AppConfig store due to operation cancelled exception (or aggregate exception where all inner exceptions are operation cancelled exceptions).
When a refresh operation fails, we set a backoff period during which no calls are made to the server. Your application was in this backoff period from 14:48:55-14:49:15 because we did not even attempt to make calls to AppConfig during this time. This is expected behavior. The backoff period ensures that applications get some time to cool off after possibly transient issues like network timeouts. In most transient cases, applications recover automatically after a few minutes.

As for why you're seeing operation cancelled exceptions, I wasn't able to find your client request IDs in our server logs. This could mean 2 things:

  1. The request never reached AppConfig server, so it could be a client-side network issue.
  2. The request IDs are too old and we don't have those logs anymore. To confirm this, could you provide more recent client request IDs (within the last 28 days).

The client request that i provided were 2 days old at most,
Here are some more just to be sure:
03129543-6fd6-4bc9-b964-406a3c867216
022b8f4b-9cfa-40a3-9bb9-e5620df223bd
530ea1dc-6314-4307-927e-baea106c60a2
8c56e905-7f35-499a-8e93-b96b9c80d1e7

All of these are less than 24 hours old.

If you can´t see these request in your server logs I´ll assume that these are network issues.
Thank you for your time!

I couldn't find the new IDs too. Looks like your application might be facing transient network issues.

Okay, thank you so much.