`FileStorage` closed unexpectedly when using `copy_current_request_context`
jimdigriz opened this issue · 1 comments
Whilst using boto3 for some S3 work in Flask, I needed to register an event hook which led to me discovering that request.files[...]
get closed before the request is completely handled.
I found what looks like one of the events runs in a different thread so I wrapped the hook with copy_current_request_context
which worked, but then found that by doing this, after calling object.load()
(triggering the hooks) that request.files[...]
is closed prematurely.
After some digging I found pallets/werkzeug#1685 which I guess meant that some how the file streams loose context whilst bouncing through the thread and into another context (the copy of the original request context). I know that reported issue is for the main request.stream
, but this affects the FileStorage
streams under request.files
.
To reproduce it here is the minimal code I managed to get this occurring with; only difference between test_work
and test_bust
is the use of session.events.register(...)
.
import boto3
from flask import Flask, abort, copy_current_request_context, request
app = Flask(__name__)
def _test(params, **kwargs):
pass
@app.route('/work', methods=['GET', 'POST'])
def test_work():
if 'test' not in request.files:
abort(400)
file = request.files['test']
session = boto3.session.Session()
s3 = session.resource('s3')
object = s3.Object('test', 'test')
try:
object.load()
except:
pass
return { 'closed': request.stream.closed, 'file': file.closed }
@app.route('/bust', methods=['GET', 'POST'])
def test_bust():
if 'test' not in request.files:
abort(400)
file = request.files['test']
session = boto3.session.Session()
session.events.register('provide-client-params.s3', copy_current_request_context(_test))
s3 = session.resource('s3')
object = s3.Object('test', 'test')
try:
object.load()
except:
pass
return { 'closed': request.stream.closed, 'file': file.closed }
Spin up a local S3 endpoint, though running one means you are not waiting 10 seconds for the retry logic in boto3 to give up:
docker run -it --rm -p 9000:9000 -p 9001:9001 quay.io/minio/minio server --console-address :9001 /data
Run with:
env AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin AWS_ENDPOINT_URL=http://localhost:9000 ./.venv/bin/flask --app test run
Test with:
$ echo hello world > test.txt
$ curl -sS --fail-with-body -D /dev/stderr -F 'test=@test.txt' http://localhost:5000/work
HTTP/1.1 200 OK
Server: Werkzeug/3.0.1 Python/3.11.2
Date: Mon, 29 Jan 2024 10:01:58 GMT
Content-Type: application/json
Content-Length: 30
Connection: close
{"closed":false,"file":false}
And test the failure (where file
gets closed) with:
alex@hanzawa:~/src/kx/downloads$ curl -sS --fail-with-body -D /dev/stderr -F 'test=@test.txt' http://localhost:5000/bust
HTTP/1.1 200 OK
Server: Werkzeug/3.0.1 Python/3.11.2
Date: Mon, 29 Jan 2024 10:02:42 GMT
Content-Type: application/json
Content-Length: 29
Connection: close
{"closed":false,"file":true}
My expectation is that file
remains open until I return my response from test_bust
.
My use case is I need to pass FileStorage
to an S3 upload request, so obviously with the file handle closed, that makes it somewhat difficult :)
I worked around the problem by using functools.partial
and just shimming over what I needed instead of using request
, but reporting this in case it is considered a bug or at least so others can find this and prevent themselves losing hours of their lives trying to figure out what is happening.
My environment is:
- Python version: 3.11.2
- Flask version: 3.0.1
- Boto3 version: 1.34.29
It's not possible to copy the stream (and thus the files) from the initial request to the copied request. It's not possible to copy open file-like objects in general. What copy_current_request_context
copies is the context, not the request the context refers to. So both context still refer to the same data. Once the view or other function ends (whichever first), the context is popped which triggers cleaning up the request it's associated with. Therefore as said in the other issue you linked, you need to store file data first so that it is available separately from the request data.