Server-side timeout
Closed this issue · 4 comments
We should be able to set a maximum length of request. If the greenlet takes longer, it gets killed. This means we don't let greenlets choke up the server for too long.
Initial pass looks something like this:
--- a/baseplate/server/thrift.py
+++ b/baseplate/server/thrift.py
@@ -1,3 +1,4 @@
+import functools
import socket
from typing import Any
@@ -5,6 +6,8 @@ from typing import Dict
from typing import Tuple
from typing import Union
+import gevent
+
from gevent.pool import Pool
from gevent.server import StreamServer
from thrift.protocol.THeaderProtocol import THeaderProtocolFactory
@@ -59,12 +62,14 @@ def make_server(server_config: Dict[str, str], listener: socket.socket, app: Any
server_config,
{
"max_concurrency": config.Integer,
+ "response_timeout": config.Timespan,
"stop_timeout": config.Optional(config.Integer, default=0),
},
)
pool = Pool(size=cfg.max_concurrency)
- server = GeventServer(processor=app, listener=listener, spawn=pool)
+ spawn = functools.partial(pool.spawn, gevent.with_timeout, cfg.response_timeout.total_seconds())
+ server = GeventServer(processor=app, listener=listener, spawn=spawn)
server.stop_timeout = cfg.stop_timeout
runtime_monitor.start(server_config, app, pool)
Problems:
- The asymetry of the timeout option types is weird.
- The stack trace is horrific. Maybe a custom timeout exception would help?
Related question: Once I define this and I start seeing this timeout occur, how will I be able to track down the cause? Would there be any way I can log and include pending http/thrift/cass requests?
That implementation is bad. It's timing out whole connections not individual requests. Will keep looking into this.
@curioussavage: Absolutely will make sure the telemetry is there to diagnose the timeout. At the minimum, the timeout exception is dumping stack traces for the current and other greenlets so you can see what we were waiting for. More to come.
This is the basics of a working implementation: master...spladug:request-timeout2
but there are two big questions:
- we should be able to configure global (all endpoint) and endpoint-specific timeouts in the config file (timeouts might vary in different configurations of an app, like in different pools with different types of clients). how do we get that information into the right part of the framework?
- this necessarily uses
gevent
directly where we haven't done so before outside the server code. is that ok? is there any way to avoid that?