Theads and asio-grpc

Question

Theads and asio-grpc

vangork opened this issue 3 years ago · 9 comments

Thank you for implementing this excellent project to provide a consolidated way of executing async grpc command and send/receive tcp packages asynchronously with boost asio library. I just begin to use boost asio recently and have a couple of quesions when using this library.

According to this link: https://www.boost.org/doc/libs/1_78_0/doc/html/boost_asio/overview/core/threads.html, multiple threads may call io_context::run() to set up a threads pool and the io_context may distribute work across them. Dose asio-grpc's execution_context also guarantee thread safety if threads pool is enabled on it? I am using C++20 coroutines and assuming that each co_spawn will locate a thread from the threads pool and run the composed asynchronous operations. Correct me if my understanding is wrong. What if the composed asynchronous operations contains a blocking operation, it may block the running thread and how can I prevent the other co_spawn call to use the blocked thread for execution? In additional, co_spawn could spawn from both execution_context and excutor. I am guessing that if spawn from execution_context it will locate a new thread and run while from excutor, it will just run on the thread that the excutor is running. Is my guessing correct?

Meanwhile #8 mentions that if co_spawn non-grpc async operation like steady_timer from grpc_context, it will automatically spawns a 2nd io_context thread. So it seems that asio-grpc internally maintain two threads for both grpc execution_context and io_context to run async grpc operations and other async non-grpc operations. And the last comments says version 1.4 would also support ask io_context for a agrpc::GrpcContext. Considering my application would serve many clients and for each client's requst issue one single composed asynchronous operations containing one async grpc call and several async tcp read&write to the server call and response back to the client, will asio-grpc guarantee there won't have interleave between the grpc operation and the tcp operations when the single composed asynchronous operation is co_spawned from either grpc_context or io_context since they are from two context on two thread? Also does asio-grpc support the mode of having threads pool for io_context and single thread for grpc_context or both have threads pool enabled?

                      one single composed asynchronous operations
                     /                                            \
client1 --> { co_wait async grpc operation, co_wait async tcp operations } --> server 
client2 --> { co_wait async grpc operation, co_wait async tcp operations } --> server
clientN ...

Hope to get some guidence from you. Thanks.

Answer 1 · 2022-03-03T11:58:03.000Z

Hi, thank you for your positive feedback. I count five questions :)

Does asio-grpc's execution_context also guarantee thread safety if threads pool is enabled on it?

At the moment the GrpcContext may only be run on one thread at a time, see documentation of run(). This is because gRPC used to recommend using one grpc::CompletionQueue per thread. (They now recommend two threads but I have not implemented support for that yet. I also assume that the extra synchronization needed in both library and user code will outweigh the potential performance benefit.)
You are correct that asio::io_context choses one of the threads that call run() when co_spawning a coroutine.

What if the composed asynchronous operations contains a blocking operation, it may block the running thread and how can I prevent the other co_spawn call to use the blocked thread for execution?

True blocking is generally bad for both asio::io_context and GrpcContext. By true I mean things like waiting for a std::future or calling std::this_thread::sleep_for. Other "blocking" operations like async_waiting for a asio::steady_timer are fine. If you have true blocking functions then you could post them onto a thread_pool:

    asio::thread_pool thread_pool;
    asio::co_spawn(grpc_context,
                   [&]() -> asio::awaitable<void>
                   {
                       co_await asio::post(asio::bind_executor(thread_pool, asio::use_awaitable));
                       // Now executing on a thread of the thread_pool, perform blocking tasks here

                       // Optionally switch back to the GrpcContext explicitly.
                       co_await asio::post(asio::use_awaitable); 
                       // Or switch back implicitly by using a rpc function like
                       co_await agrpc::finish(writer, grpc::Status::OK, asio::use_awaitable);
                   });

I am guessing that if spawn from execution_context it will locate a new thread and run while from excutor, it will just run on the thread that the excutor is running. Is my guessing correct?

No, the behavior of co_spawn with executor and co_spawn with execution_context is identical. This applies to both asio::io_context and GrpcContext.

Considering my application would serve many clients and for each client's requst issue one single composed asynchronous operations containing one async grpc call and several async tcp read&write to the server call and response back to the client, will asio-grpc guarantee there won't have interleave between the grpc operation and the tcp operations when the single composed asynchronous operation is co_spawned from either grpc_context or io_context since they are from two context on two thread?

I am not sure what you mean by 'interleave', there won't be any race conditions. I personally do not like how asio::execution_context creates a background thread when using GrpcContext for asio io objects like steady_timer. I prefer to manage the io_context and GrpcContext explicitly.
If your application handles RPCs requests but performs most of its work on an io_context then you could do:

    agrpc::repeatedly_request(
        &test::v1::Test::AsyncService::RequestUnary, service,
        asio::bind_executor(grpc_context,
                            [&](grpc::ServerContext& server_context, test::msg::Request& request,
                                grpc::ServerAsyncResponseWriter<test::msg::Response>& writer) -> asio::awaitable<void>
                            {
                                // Just an example, could be tcp_socket etc.
                                asio::steady_timer timer{io_context, std::chrono::milliseconds(10)};
                                co_await timer.async_wait(asio::bind_executor(io_context, asio::use_awaitable));
                                // By using bind_executor execution will not switch back to the GrpcContext
                                // when the timer expires which might provide better performance if you perform
                                // more io_context related operations afterwards

                                // Eventually finish the RPC
                                test::msg::Response response;
                                co_await agrpc::finish(writer, response, grpc::Status::OK, asio::use_awaitable);
                            }));

or co_spawn onto the io_context directly. Note that you must use asio::bind_executor for all functions of this library in that case:

    asio::co_spawn(
        io_context,
        [&]() -> asio::awaitable<void>
        {
            test::msg::Request request;
            grpc::ServerAsyncResponseWriter<test::msg::Response> writer{&server_context};
            if (!co_await agrpc::request(&test::v1::Test::AsyncService::RequestUnary, service, server_context, request,
                                         writer, asio::bind_executor(grpc_context, asio::use_awaitable)))
            {
                co_return;  // gRPC server is shutting down
            }
            // Now executing on a thread of the GrpcContext

            // co_spawn the next agrpc::request here.
            // agrpc::repeatdely_request does that automatically but it cannot yet be customized to co_spawn onto an
            // io_context directly.

            // Perform your io_context related tasks. The first awaited async operation will automatically switch
            // execution to the io_context.
            asio::steady_timer timer{io_context, std::chrono::milliseconds(10)};
            co_await timer.async_wait(asio::use_awaitable);
            // Now executing on a thread of the io_context

            // Eventually finish the RPC
            test::msg::Response response;
            co_await agrpc::finish(writer, response, grpc::Status::OK,
                                   asio::bind_executor(grpc_context, asio::use_awaitable));
        },
        asio::detached);

Also does asio-grpc support the mode of having threads pool for io_context and single thread for grpc_context or both have threads pool enabled?

Like mentioned earlier, GrpcContext currently does not support thread pools and can only use a single thread for RPCs. Even if it did, you would have full control over how many threads are being used by calling run() on as many threads as you want.

Answer 2 · 2022-03-03T12:52:27.000Z

Another variation of the last example that I gave that takes care co_spawning the next agrpc::request in a timely fashion:

    asio::io_context io_context;
    auto request_handler = [&](grpc::ServerContext& server_context, test::msg::Request& request,
                               grpc::ServerAsyncResponseWriter<test::msg::Response>& writer) -> asio::awaitable<void>
    {
        // Now executing on a thread of the io_context.
        // Perform your io_context related tasks.

        // Eventually finish the RPC.
        test::msg::Response response;
        co_await agrpc::finish(writer, response, grpc::Status::OK,
                               asio::bind_executor(grpc_context, asio::use_awaitable));
    };
    agrpc::repeatedly_request(&test::v1::Test::AsyncService::RequestUnary, service,
                              asio::bind_executor(grpc_context,
                                                  [&]<class T>(agrpc::RepeatedlyRequestContext<T>&& context)
                                                  {
                                                      asio::co_spawn(
                                                          io_context,
                                                          [&, context = std::move(context)]()
                                                          {
                                                              return std::apply(request_handler, context.args());
                                                          },
                                                          asio::detached);
                                                  }));

Answer 3 · 2022-03-12T04:06:48.000Z

Thank you for the very detailed and quick response. I got the point that GrpcContext use a single thread which works like a strand and asio::bind_executor will dispatch the async grpc calls to the GrpcContext one by one.

Another question regarding the grpc calls. To increase the concurrency of grpc calls, is it able to enable multiple connections and load balance policy in a grpc channel by grpc::CreateCustomChannel? And will it have better performance?

Answer 4 · 2022-03-12T14:27:01.000Z

I cannot find an option to enable multiple connections among the channel arguments, where did you read about it? I cannot see how it would be benefit for HTTP2 connections which multiplex within one connection already anyways.

I wrote a simple benchmark for a helloworld client here. On my machine it seems that:

Creating multiple channels does not improve throughput, e.g. one channel per thread. (Even with the GRPC_ARG_USE_LOCAL_SUBCHANNEL_POOL option, in fact that option reduces performance in this case.)
Creating multiple stubs does not improve throughput, e.g. one stub per thread.
Creating multiple agrpc::GrpcContexts (per thread) improves throughput as expected.

I have not used load balance policies yet, so I cannot help here. You could try to get an answer in the mailing group or by creating an issue on grpc/grpc.

Answer 5 · 2022-03-21T03:02:22.000Z

Hi Tradias, I tested the aysnc grpc call performance in multiple coroutine and created https://github.com/vangork/test-async-grpc/blob/main/test-grpc-cpp/test.cpp. It can archieve higher request/s rate than doing serially in one single post https://github.com/vangork/test-async-grpc/blob/main/test-grpc-cpp-benchmark/test-benchmark.cpp (customized based on your benchmark). But in this case, is std::unique_ptr<pravega_grpc::ControllerService::Stub> stub_) thead safe or it is necessary to put a Rwlock over it as it is sharing between the coroutines or the lock is not necessary as grpc context only runs on a single thread even though it is not thread safe?

Regarding to the multiple connections, I also can't find a method or property to set it. But just came across the GRPC Performance Best Practices doc metioned:

Each gRPC channel uses 0 or more HTTP/2.

It also mentioed Create a separate channel for each area of high load, not sure if one stub can connect to a channel pool?

The commonly used grpc library tonic for rust language supports to create a channel with balance_list. https://github.com/vangork/test-async-grpc/blob/e558d84592a6d90aec8b67389e5a302aabb3e013/test-grpc-rust/src/main.rs#L26-L30. FYI

Answer 6 · 2022-03-21T09:35:21.000Z

Better performance is expected since your example uses a higher concurrency. I must say I do not know how to properly benchmark a gRPC client, I have only ever benchmarked servers using ghz. ghz differentiates between number of CPUs, maximum number of parallel in-flight requests (they call this 'concurrency') and number of connections (I assume this refers to the number of Channels). If we want to benchmark a gRPC client then we probably want to make all of these things configurable, so that we can play around with different values and see what works best.

The Stub is coming from gRPC itself and from what I can see in its implementation it seems thread-safe. In your example there is only one thread that uses the Stub, so no sharing takes place anyways.

Creating a pool of Channels and distributing work among them sounds easy enough. I also found this recommendation in the .NET gRPC library:

Use a pool of gRPC channels, for example, create a list of gRPC channels. Random is used to pick a channel from the list each time a gRPC channel is needed. Using Random randomly distributes calls over multiple connections.
https://docs.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-6.0#connection-concurrency

I would personally use a round-robin strategy instead of Random.

I would create one Stub per Channel in such a pool (similar to here), instead of worrying about attaching one Stub to multiple Channels. A Stub is actually rather lightweight, so that shouldn't be an issue.

I think such a channel pool should not be part of asio-grpc seeing that it can be implemented entirely without asio-grpc and seeing that the gRPC team is working on making it unnecessary (see 'side note').

Answer 7 · 2022-03-25T10:15:51.000Z

I have adjusted the example here to create a configurable amount of GrpcContexts and Channels and round-robin distribute work among them. I hope that provides an idea on how to possibly implement it on your side.

Answer 8 · 2022-03-26T12:22:02.000Z

Thanks very much!

Answer 9 · 2022-04-04T17:49:55.000Z

I will close the issue here. If you have more questions or requests then do not hesitate to open another issue :).