grpc/grpc-dotnet

gRPC client stability issues on high demand

salememd opened this issue · 9 comments

Our apps are running on .NET 8 C# and Azure app service. In every user http request, our client app makes grpc call to the identity server to check permissions and other stuff, on high volume, the grpc client starts throwing the following exception : -
Call failed with gRPC error status. Status code: 'Unknown', Message: 'Bad gRPC response. HTTP status code: 500'.
There are no other details in the exception,

Our client is configured as the following: -

services.AddGrpcClient<IdentityPortalMiddleware.Protos.User.UserEndpoint.UserEndpointClient>(o =>
            {
                o.Address = new Uri(Configuration["identity:MiddelwarePortalURL"]);

            }).ConfigurePrimaryHttpMessageHandler(() =>
            {
                var handler = new SocketsHttpHandler();
                handler.EnableMultipleHttp2Connections = true;
                return handler;
            }).ConfigureChannel((e) =>
            {
                var loggerFactory = LoggerFactory.Create(logging =>
                {
                    logging.AddConsole();
                    logging.SetMinimumLevel(LogLevel.Debug);
                });
                e.LoggerFactory = loggerFactory;
            })

Client logs

2024-06-04T18:45:39.2106962Z Successfully picked subchannel id '5-1' with address xxxx.azurewebsites.net:443. Transport status: ActiveStream
2024-06-04T18:45:39.2107162Z info: Grpc.Net.Client.Internal.GrpcCall[3]
2024-06-04T18:45:39.2107357Z Call failed with gRPC error status. Status code: 'Unknown', Message: 'Bad gRPC response. HTTP status code: 500'.
2024-06-04T18:45:39.2107561Z dbug: Grpc.Net.Client.Internal.GrpcCall[4]
2024-06-04T18:45:39.2107726Z Finished gRPC call.
2024-06-04T18:45:39.2107915Z info: Grpc.Net.Client.Internal.GrpcCall[3]
2024-06-04T18:45:39.2108092Z Call failed with gRPC error status. Status code: 'Unknown', Message: 'Bad gRPC response. HTTP status code: 500'.
2024-06-04T18:45:39.2108292Z dbug: Grpc.Net.Client.Internal.GrpcCall[4]
2024-06-04T18:45:39.2108478Z Finished gRPC call.2024-06-04T18:45:39.2108658Z info: Grpc.Net.Client.Internal.GrpcCall[3]
2024-06-04T18:45:39.2108846Z Call failed with gRPC error status. Status code: 'Unknown', Message: 'Bad gRPC response. HTTP status code: 500'.
2024-06-04T18:45:39.2109034Z dbug: Grpc.Net.Client.Internal.GrpcCall[4]
2024-06-04T18:45:39.2109230Z Finished gRPC call.

Testing code


            List<AsyncUnaryCall<GetPermissionsResponse>> l = new List<AsyncUnaryCall<GetPermissionsResponse>>();
            for(int i = 0; i < 1000; i++)
            {
                l.Add(_client.GetPermissionsAsync(new GetPermissionsRequest { Id = Profile.Id, Privilege = (int)method, Actor = actor }, new Metadata() { { "Authorization", $"Bearer {token}" } }));
            }
          
                var r = await Task.WhenAll(l.Select(c => c.ResponseHeadersAsync));

There are no other details in the exception

A 500 status code is all the client can see. You need to look at the server to discover why it is sending 500 status code.

We thought so, we enabled debug logging on the server side but no exception or anything weird showed up, and the server CPU/memory remains at low levels and normally processes other GRPC requests from different services.

someone 5 days ago posted the same issue here: -
https://learn.microsoft.com/en-sg/answers/questions/1688994/bad-grpc-response-http-status-code-500-error-with?source=docs

Is the problem inside Azure App Service + gRPC? The gRPC client is just reporting the response the server is giving it.

Both the client and server apps are inside Azure App Service Premium V3 (P2v3: 1) and in the same App Service plan.
We doubt that "EnableMultipleHttp2Connections" option is not working as expected, and when we added "ServerGarbageCollection" in the client app the issue appeared less frequently

To be clear, there are layers between your gRPC server and your gRPC client. It's possible the 500 error is happening there. You will need to ask Azure folks what is happening.

I can't give you any extra information.

I just want to add this

We deployed the server app on Azure Container App and the problem no longer appears. We're thinking about migrating core apps to Azure Container App

Thanks

@salememd which version of the libraries are you using? And have you switched the client / the server / both to Azure Container App?

We are having the exact same issue and it started happening a few weeks ago and is more frequent now.

@daniil-korolev everything updated to latest version both server and client side, and we only moved the server app to Azure Container App, client app still in Azure App Service, and we also noticed that the issue is happening more frequent recently