ClickHouse/clickhouse-cpp

new feature needed : suport coroutine for co_yield c++20

asialiugf opened this issue · 6 comments

    client.Select("SELECT id, name FROM default.numbers", [] (const Block& block) -> std::generator<Block>
        {
            for (size_t i = 0; i < block.GetRowCount(); ++i) {
                std::cout << block[0]->As<ColumnUInt64>()->At(i) << " "
                          << block[1]->As<ColumnString>()->At(i) << "\n";
                          co_yield block ;
            }
        }
    );

Would you mind giving more context about what you need to implement?

I have a websocket server eventloop (just one thread for all user, all client sockets, just like node.js) , when the eventloop recieved user's request, it will select data from cliclhouse and response .

one eventloop loops again and again to deal users sockets. (I just call them first loop, second loop ... )

if one user's return data is huge , the other users will be blocked because of one eventloop for all sockets.

so I hope coroutine can help me . for example: if a user's return data from clickhouse is large , the return data can be seperated for several parts to response one by one. after the first part sent back to the user in first loop, the clickhouse select will be coyeild, and then jump to deal the other's request, after first loop end, the eventloop run the second loop. in second loop, the clickhouse select result will be resumed to send next parts of the return data for this user untill all parts be sent.

in coroutine generator, the deal function should have a coroutine return type : std::generator

but here is a lumbda in client.Select( {} ); and I dont kown how to difine the generator.

If you have time, you can refer this:
https://github.com/lewissbaker/generator
an example:

        int intn = 10;
        std::generator<int> g_int = [&]() -> std::generator<int> {
            for (int i = 0; i < intn; i++) co_yield i;
        }();
        size_t count_int = 0;
        auto it_int = g_int.begin();
        for (int i = 0; i < 30; ++i) {
            if (it_int == g_int.end()) break;
            decltype(auto) m_int = *it_int;
            it_int++;
            printf("%d ---- int !!\n", m_int);
        }

thank you !

A good idea

The checklist :

  1. Async interface is a large feature, a lot of code about network , so i plan to use standalone contrib/asio .
  2. Replace the present code about network (retain the sync interfaces)
  3. Add async interfaces (query, instert and so on)
  4. Provide cluster interfaces (health detection, load balancing writing, etc) based on asynchronous interfaces
  5. Coroutine interfaces based on asio is cheap.

@Enmk Please give me your advice, thanks

Due to network interaction repeatedly in one requset-response, the async interfaces will use coroutine inside.
So c++20 is necessary, the clickhouse-cpp will go to v3.0 😅

Enmk commented
  • Async interface is a large feature, a lot of code about network , so i plan to use standalone contrib/asio .
  • Replace the present code about network (retain the sync interfaces)
  • Add async interfaces (query, instert and so on)
  • Provide cluster interfaces (health detection, load balancing writing, etc) based on asynchronous interfaces
  • Coroutine interfaces based on asio is cheap.

I don't think that health checks (of CH server, I assume) and load balancing (aside from #310) should be implemented in this library.

Also, sine one Client instance means one connection (Client effectively uses one socket), IMO having async interface in the Client itself might be high-effort-low-benefit endeavor.

Perhaps one might want to implement some sort of AsyncClient on top of the Client, using 1 Client per 1 request, in either throw-away manner or have a pool (it is important to re-use only Client instances that didn't raise exceptions). That also doesn't break existing user code and doesn't require a full rewrite of the networking code.