Streaming responses and latency
bheilbrun opened this issue · 1 comments
What's the best way to measure duration for streaming endpoints?
If I'm not mistaken, the current latency measurements don't work for streaming responses. prometheus_flask_exporter measures the time to return the response generator rather than the time to actually generate the stream response.
Flask's stream documentation gives an example of a streaming endpoint,
@app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
yield f"{','.join(row)}\n"
return app.response_class(generate(), mimetype='text/csv')
In this example, prometheus_flask_exporter would start a duration timer via Flask.before_request
and then record the duration via Flask.after_request
. When after_request
is invoked, the actual response bytes haven't been generated or sent.
I wonder if measuring via Flask.teardown_request
along with stream_with_context()
would work, but not sure.
Thoughts appreciated!
That's an interesting one! I haven't checked stream_with_context()
yet, but my gut-feel is that you could add a custom metric on the generate()
function inside the request handler function, and that should be timed OK.
We could also look at adding some streaming-friendly wrappers to work with those handlers directly, I haven't looked at that yet.