Should switch from polling to C-level select() or similar
wch opened this issue · 3 comments
Currently, websocket uses polling (using callbacks with the later package) to check for incoming messages. On my computer, an otherwise-idle R process consumes 17% of the CPU if there is an active websocket which is polling. We should switch to a C-level select (or something similar) to reduce the CPU usage.
Now that I've done a little bit more research, I think that perhaps the way to go is to start a separate thread for each websocket and call .run()
, which blocks and runs the event loop on that thread. The C++-level callback when a message arrives would then schedule an R-level callback (using later) on the main thread to handle the message.
https://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/overview/core/threads.html
You don't necessarily need a separate thread for each websocket. I believe you can listen on multiple websockets using a single call to ioservice::run (as it's really a global task manager). For concurrency, you can actually call ioservice::run on multiple threads at the same time (effectively creating a thread-pool rather than a thread-per-service or thread-per-websocket).
Here's the main function for the ioservice run call in RStudio Server (https://github.com/rstudio/rstudio/blob/aa907461f068420805b7d819e7399d8c108a21a2/src/cpp/core/include/core/http/AsyncServerImpl.hpp#L247-L256)
void runServiceThread()
{
try
{
boost::system::error_code ec;
acceptorService_.ioService().run(ec);
if (ec)
LOG_ERROR(Error(ec, ERROR_LOCATION));
}
CATCH_UNEXPECTED_EXCEPTION
}
Then here's where we create the thread pool. By default we use 2 threads but this can be whatever the application requires for concurrency including 1 thread if all blocking operations are async (https://github.com/rstudio/rstudio/blob/aa907461f068420805b7d819e7399d8c108a21a2/src/cpp/core/include/core/http/AsyncServerImpl.hpp#L187-L197):
// create the threads
for (std::size_t i=0; i < threadPoolSize; ++i)
{
// run the thread
boost::shared_ptr<boost::thread> pThread(new boost::thread(
&AsyncServerImpl<ProtocolType>::runServiceThread,
this));
// add to list of threads
threads_.push_back(pThread);
}
I spent some time digging into this with @wch and found the following:
- It turns out that the 17% CPU cited is observed when using chromote. By itself, websocket is only contributing ~1% CPU usage on an idle socket. So this is a much lower priority than we'd expected. When I opened 30 websockets in parallel, I saw the usage climb to ~50%, but I don't know that that's a use-case that we care about supporting explicitly. After moving the polling out of R, I saw no noticeable CPU usage for idle threads.
- We discussed two designs: 1.) Move each websocket into its own thread, or 2.) create a background thread in which all websocket interactions happen. We pursued option 1 first but hit a few walls. The documentation about what exactly is and is not thread-safe on in websocketpp leave some open questions that left me feeling nervous about trying to interact with these objects over different threads. WIP with a couple of commits here. Option 2 seemed a bit simpler and -- given our expected usage -- would probably be sufficient for performance. We'd use a pattern similar to what we have in httpuv and pass messages from R to C++ using some threadsafe queue, and then use later to invoke R callbacks from C++. very nascent WIP here.
- If performance is going to be a focus for this package, we should probably have automated test suites covering that. I saw that the author of websocketpp also has a package focusing on websocket performance (either client or server) that we could leverage if we wanted to go that route.
But the decision for now is that the cost doesn't justify the time it would take to clean this up. We like the idea of building these patterns into later directly so that client packages like httpuv or websocket could just use those. But until that day comes, we'll probably leave this unless it becomes more of an issue.