rycus86/prometheus_flask_exporter

Streaming responses and latency

bheilbrun opened this issue · 1 comments

What's the best way to measure duration for streaming endpoints?

If I'm not mistaken, the current latency measurements don't work for streaming responses. prometheus_flask_exporter measures the time to return the response generator rather than the time to actually generate the stream response.

Flask's stream documentation gives an example of a streaming endpoint,

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return app.response_class(generate(), mimetype='text/csv')

In this example, prometheus_flask_exporter would start a duration timer via Flask.before_request and then record the duration via Flask.after_request. When after_request is invoked, the actual response bytes haven't been generated or sent.

I wonder if measuring via Flask.teardown_request along with stream_with_context() would work, but not sure.

Thoughts appreciated!

That's an interesting one! I haven't checked stream_with_context() yet, but my gut-feel is that you could add a custom metric on the generate() function inside the request handler function, and that should be timed OK.
We could also look at adding some streaming-friendly wrappers to work with those handlers directly, I haven't looked at that yet.