Streaming responses and latency

Question

Streaming responses and latency

bheilbrun opened this issue 2 years ago · 1 comments

What's the best way to measure duration for streaming endpoints?

If I'm not mistaken, the current latency measurements don't work for streaming responses. prometheus_flask_exporter measures the time to return the response generator rather than the time to actually generate the stream response.

Flask's stream documentation gives an example of a streaming endpoint,

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return app.response_class(generate(), mimetype='text/csv')

In this example, prometheus_flask_exporter would start a duration timer via Flask.before_request and then record the duration via Flask.after_request. When after_request is invoked, the actual response bytes haven't been generated or sent.

I wonder if measuring via Flask.teardown_request along with stream_with_context() would work, but not sure.

Thoughts appreciated!

Answer 1 · 2022-10-15T21:25:56.000Z

That's an interesting one! I haven't checked stream_with_context() yet, but my gut-feel is that you could add a custom metric on the generate() function inside the request handler function, and that should be timed OK.
We could also look at adding some streaming-friendly wrappers to work with those handlers directly, I haven't looked at that yet.