Opening concurrent file handles for all instancefiles risks breaching OS limit.
Closed this issue · 6 comments
The use of the BinaryIO
type for instancefiles
causes file handles to be opened concurrently by Click for all instance files supplied at the command line.
A user may, for example, run a glob expansion on a number of targets that exceeds the limit of open files they are permitted by their operating system. Currently check-jsonschema
fails in this case with OSError: [Errno 24] Too many open files
. It would be nice if files were opened sequentially so that the user could supply a far greater number of arguments to a single command without error.
A minimal case for simulating this on Linux is as follows:
# Set open files limit to 1024 (this is a typical default on Linux,
# but we do it explicitly to make the test repeatable):
ulimit -n 1024
mkdir /tmp/c-js
echo '{}' > /tmp/c-js/schema.js
for i in $(seq -f "%04g" 1024); do echo '{}' > /tmp/c-js/instance.$i.json; done;
check-jsonschema --schemafile /tmp/c-js/schema.js /tmp/c-js/instance.0001.json
# ok -- validation done
check-jsonschema --schemafile /tmp/c-js/schema.js /tmp/c-js/instance.*.json
# OSError: [Errno 24] Too many open files
Could it be possible to type the instancefiles
argument as a str
or Path
and only open the file handles one by one as needed?
Thanks for the report/request!
It's certainly possible to do -- it is how earlier versions worked. I switched to the built-in click behavior when adding support for stdin, as it seemed like a nice simplification.
Before I pursue this, a quick question: are you running into this issue today, do you foresee running into it (soon), or is the issue purely theoretical?
This may be relevant for prioritization, and I may be pressed on it if I look to make any upstream contributions to click in support of it.
I'll want to look into click's capabilities to see if it can open the files lazily before pursuing a change here.
Thanks for the quick response. The issue already occurs for me on a project where I'm globbing around 2,000 files. However, it's simple to work around as users can normally raise the ulimit value to anything they like (eg ulimit -n 4096
), so it's more a matter of convenience and tidy resource management than a critical issue.
I checked and click does have a lazy mode for file objects, but it rigs them up to close as part of the command exiting. I don't think that using it naively will solve this.
I'll have this near the front of my queue when I'm next able to devote some time to this project (likely in a little over a week).
Minor update:
I've been tinkering on this but it's not working quite as smoothly as I wanted. The click lazy file object does an eager open/close to catch some errors (e.g. permissions) early, which messes with some of the fifos in the testsuite.
It works in a live context, when I use a shell to write a fifo, so I must be doing something slightly wrong in the tests. I might also file a click bug report, since I think the open/close behavior is slightly unsafe for special file types.
I'm going to try to figure out how to fix the fifo tests, but I do have some options for handling this all in the worst case.
Tested and working perfectly. Thanks very much!
Wow, thanks for testing before I even cut the release! I was meaning to push this out yesterday but got a bit delayed.
It's great to hear that it's working for you as intended.
I've just dropped v0.27.3 with this included -- it's the primary update in there. Let me know if you see any issues with the released version. 😄