grpc/grpc-dotnet

Calling a blocking unary rpc from multiple threads

jurajsokol opened this issue · 14 comments

Hello,
I have an older app which I am porting from .Net framework to .Net and grpc. I have created a simple app to demonstrate how it works.
A blocking unary call is invoked from multiple threads. That does't work as it should. The question is, how should I use it correctly?

What version of gRPC and what language are you using?

2.59.0

What operating system (Linux, Windows,...) and version?

Windows 10 21H2 build 19044.3086

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

Host:
Version: 8.0.0
Architecture: x64
Commit: 5535e31a71

.NET SDKs installed:
8.0.100

What did you do?

Blocking unary call is invoked from multiple threads in client:
Server service:

namespace Server
{
    public class RetrierService : Retrier.RetrierBase
    {
        public override async Task<Response> DeliverPackage(Package request, ServerCallContext context)
        {
            // simulate some work
            await Task.Delay(2000);

            return new Response() { Message = $"+ {request.Name}" };
        }
    }
}

Asp.Net core configuration

using Server;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddGrpc();
var app = builder.Build();

app.MapGrpcService<RetrierService>();
app.Run();

Client

using Grpc.Core;
using Grpc.Net.Client;
using Retry;

using var channel = GrpcChannel.ForAddress("https://localhost:5001");
var client = new Retrier.RetrierClient(channel);

object printLock = new();
Action<string, ConsoleColor> printMessage = (message, color) =>
{
    lock (printLock)
    {
        Console.ForegroundColor = color;
        Console.Write(message);
        Console.WriteLine();
        Console.ResetColor();
    }
};

string[] itemList =
{
    "Secrets of Silicon Valley",
    "The Busy Executive's Database Guide",
    "Emotional Security: A New Algorithm",
    "Prolonged Data Deprivation: Four Case Studies",
    "Cooking with Computers: Surreptitious Balance Sheets",
    "Silicon Valley Gastronomic Treats",
    "Sushi, Anyone?",
    "Fifty Years in Buckingham Palace Kitchens",
    "But Is It User Friendly?",
    "You Can Combat Computer Stress!",
    "Is Anger the Enemy?",
    "Life Without Fear",
    "The Gourmet Microwave",
    "Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean",
    "The Psychology of Computer Cooking",
    "Straight Talk About Computers",
    "Computer Phobic AND Non-Phobic Individuals: Behavior Variations",
    "Net Etiquette"
};

while (true)
{
    foreach (var item in itemList)
    {
        new Thread(() =>
        {
            printMessage($"taskStarted: - {System.Environment.CurrentManagedThreadId}", ConsoleColor.Yellow);

            try
            {
                string result = result = client.DeliverPackage(new Package { Name = item }, deadline: DateTime.UtcNow.AddSeconds(5)).Message;
                printMessage($"Success {result}, {System.Environment.CurrentManagedThreadId}", ConsoleColor.Green);
            }
            catch (RpcException ex)
            {
                printMessage($"Rpc Exception : {ex.ToString()} - ThreadID : {System.Environment.CurrentManagedThreadId}", ConsoleColor.Red);
            }
            catch (Exception ex)
            {
                printMessage($"{ex.Message} - ThreadID : {System.Environment.CurrentManagedThreadId}", ConsoleColor.Red);
            }
        }).Start();
    }
}

Proto file

syntax = "proto3";

package retry;

service Retrier {
  rpc DeliverPackage (Package) returns (Response);
}

message Package {
  string name = 1;
}

message Response {
  string message = 1;
}

What did you expect to see?

Successfully call server and return result to client.

What did you see instead?

The server is called with dealy or it ends up with an error.

Server side:

 Microsoft.AspNetCore.Server.Kestrel[32]
      Connection id "0HN0BIROR9S9P", Request id "0HN0BIROR9S9P:0000001B": the application completed without reading the entire request body.
fail: Grpc.AspNetCore.Server.ServerCallHandler[6]
      Error when executing service method 'DeliverPackage'.
      System.Threading.Tasks.TaskCanceledException: The request was aborted
       ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: The HTTP/2 stream was reset by the application with error code CANCEL.
         at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
         at System.IO.Pipelines.Pipe.GetReadAsyncResult()
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2MessageBody.ReadAsync(CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2MessageBody.ReadAsync(CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at Grpc.AspNetCore.Server.Internal.PipeExtensions.ReadSingleMessageAsync[T](PipeReader input, HttpContextServerCallContext serverCallContext, Func`2 deserializer)
         at Grpc.AspNetCore.Server.Internal.CallHandlers.UnaryServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
         at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)

Client side:

Grpc.Core.RpcException: Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")
 at Grpc.Net.Client.Internal.HttpClientCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
at Grpc.Core.Interceptors.InterceptingCallInvoker.<BlockingUnaryCall>b__3_0[TRequest,TResponse](TRequest req, ClientInterceptorContext`2 ctx)
at Grpc.Core.ClientBase.ClientBaseConfiguration.ClientBaseConfigurationInterceptor.BlockingUnaryCall[TRequest,TResponse](TRequest request, ClientInterceptorContext`2 context, BlockingUnaryCallContinuation`2 continuation)
at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
at Retry.Retrier.RetrierClient.DeliverPackage(Package request, CallOptions options) in C:\\Users\\jsokol\\Desktop\\grpc-dotnet\\examples\\Retrier\\Client\\obj\\Debug\\net8.0\\RetryGrpc.cs:line 102
at Retry.Retrier.RetrierClient.DeliverPackage(Package request, M..."

Anything else we should know about your project / environment?

Your client continuously creates new threads to send requests to the server, without waiting for the results of previous threads.

Your server responds to each request with a 2 second delay.

Therefore the number of requests waiting for a response is continuously growing - faster than the server can respond. Some of the requests time out (you've specified a 5 second deadline). By the time the server gets around to processing the queued request it has already been terminated by the client.

On my laptop I get about 11,500 outstanding queued requests before I start seeing errors.

Is your client a realistic model of that you are trying to test?

We have a perhaps-identical problem. WIth as few as a dozen or so concurrent requests, performance drops dramatically. A single request takes 50ms. 50 in parallel take multiple seconds. 100 in parallel take half a minute each. The problem is not server response time. These are BlockingUnary calls.

We're in asp.net classic, dotnet 4.8., using the Grpc.Net.Client.Web handler for http 1.1. We are aware of the async/deadlock issue in this environment and have addressed it. Each request runs on a Task.Run() background task. That imposes a bit of overhead but does not explain this problem, to our knowledge.

Blocking inside Task.Run is still blocking a thread pool thread.

@JamesNK

Blocking inside Task.Run is still blocking a thread pool thread.

Sure, but that shouldn't be a problem except at very high numbers. BLocking a dozen or a hundred threads is, in the scheme of things, not a big deal. Right? Not ideal, sure, but shouldn't cause this level of degradation. And we can run very many concurrent requests using .net remoting with no problem, and afaik it blocks the thread while working,

@JamesNK

Blocking inside Task.Run is still blocking a thread pool thread.

Sure, but that shouldn't be a problem except at very high numbers. BLocking a dozen or a hundred threads is, in the scheme of things, not a big deal. Right? Not ideal, sure, but shouldn't cause this level of degradation. And we can run very many concurrent requests using .net remoting with no problem, and afaik it blocks the thread while working,

I have (as an experiment) run ThreadPool.SetMinThreads() to a very high number (10,000 !) , at app startup. THis should pre-build the pool size. No help.

Another data point from my situation: i've coded a client side interceptor on BlockingUnaryCall to record the duration of

`` response = continuation(request, newContext);

So this this the rpc call, running on the background task/thread, and the problem duration is recorded here. IDK what significance that has, but it seems to rule out any problem involving restoring a SyncronizationContext on an ASP.NET thread, because this code is running on a ThreadPool thread.

Perhaps it has a limit on the number of concurrent open sockets for parallel HTTP/1.1 requests in HttpClient? You could try creating an app that makes concurrent requests but doesn't use blocking and see whether that is better.

I don't know the behavior of HttpClient on .NET Framework.

We are now testing a dotnet 6 scenario, so http2 and async. In stress tests we still see significant degradation at what we would call moderate levels of concurrency. Maybe our expectations are unrealistic. Should we expect that a grpc client using http 2 could make 100 concurrent requests (unary calls) to a grpc server without bottlenecks ? What we see is that the time spent in the server for each call remains constant but the round trip becomes very slow. The payloads are very small. We are working our way through performance tips . Havent' tried everything yet but are we being unrealistic?

There are lots of ways it could be slow: thread pool starvation, exceeding maximum number of concurrent streams on a connection, GC overhead because of client GC mode.

And there are many benchmarks of fast gRPC.

grpc-dotnet\examples\Greeter\Client > dotnet run -c Release

100 parallel tasks making 10,000 requests each:

Requests sent: 1000000
Total time: 6.0929658 seconds
Average request time: 0.6073 ms
Shutting down
Press any key to exit...

master...JamesNK:grpc-dotnet:jamesnk/hello-world-simple-benchmark

If I change it to 10 tasks sending 100,000 requests each then total runtime increases but latency per request improves.

Requests sent: 1000000
Total time: 20.9581808 seconds
Average request time: 0.2092 ms
Shutting down
Press any key to exit...

Terrific and thanks so much for that. That and the other example code is going to be great to explore.

I'm trying to run that example and getting an exception in the client. " AuthenticationException: Cannot determine the frame size or a corrupted frame... " The only change I made to the provided code was to change, in the client, the connection port from 5001 to 5000, to match the server.

Grpc.Core.RpcException
HResult=0x80131500
Message=Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: The SSL connection could not be established, see inner exception. AuthenticationException: Cannot determine the frame size or a corrupted frame was received.", DebugException="System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.")
Source=Grpc.Net.Client
StackTrace:
at Grpc.Net.Client.Internal.GrpcCall`2.d__72.MoveNext()
...
This exception was originally thrown at this call stack:
System.Net.Security.SslStream.GetFrameSize(System.ReadOnlySpan)
System.Net.Security.SslStream.EnsureFullTlsFrameAsync(System.Threading.CancellationToken, int)
...
Inner Exception 1:
HttpRequestException: The SSL connection could not be established, see inner exception.

Inner Exception 2:
AuthenticationException: Cannot determine the frame size or a corrupted frame was received.

I have tested the code that was provided with one change. The .target framework was changed to .NET 6 in both client and server projects of the Greeter solution.
100*10000 iterations. This did not end for me at all till I hit Ctrl+C on the client.

I reduced it to run a 1000 iterations (10*100) for which I got the below results.
Requests sent: 1000
Total time: 5.0910933 seconds
Average request time: 50.532 ms

With .NET 8 target framework, for 1000 iterations, I get
Requests sent: 1000
Total time: 3.3985419 seconds
Average request time: 33.4761 ms

Finally when I let it run fully, I got this result (Target framework .NET 8)
Requests sent: 1000000
Total time: 1913.1776023 seconds
Average request time: 191.1947 ms

Processor: 13th Gen Intel(R) Core(TM) i7-1365U 1.80 GHz
Ram: 32.0 GB
OS details: Windows 10 Enterprise

UPDATE

I had server logging enabled for the above runs. Once the logging was removed, I got the below for 100*10000 calls
Requests sent: 1000000
Total time: 56.412362 seconds
Average request time: 5.63 ms

Forget about the postings from Adithya and I. I wasnt running with dotnet run --release, and he nmissed the code change where logging is done. We've now reprod your results and and tryng to detrmine why our app has different behavior.