schema spec file is loaded every time - really hurts performance
brycedrennan opened this issue · 5 comments
I was trying to run a fuzzer but was surprised the validation of the spec was taking so long.
After running a profiler it became apparent that a large part of the issue is that the swagger schema file is loaded every single time that a validation call is made.
This monkeypatch sped things up significantly. Runtime of 1050 validations went from 171 seconds to 28 seconds.
from swagger_spec_validator import validator20
from functools import lru_cache
validator20.read_file = lru_cache()(validator20.read_file)
Thanks for reporting this, looks like a small change that allows to improve speed by a lot.
PRs are always welcome ;)
Is not clear to me how this actually speeds up up the processing as read_file
is used in validate_json
and other methods that do not create a loop and at the same time.
Would be possible to upload a complete example of specs and the python test file that you used and/or the pstats result of the profiling?
import time
from swagger_spec_validator import validator20
from functools import lru_cache
minimal_spec = {
"swagger": "2.0",
"info": {"title": "Test", "version": "1.0"},
"paths": {},
"definitions": {},
}
start = time.perf_counter()
for _ in range(100):
validator20.validate_spec(minimal_spec)
print(time.perf_counter() - start)
validator20.read_file = lru_cache()(validator20.read_file)
start = time.perf_counter()
for _ in range(100):
validator20.validate_spec(minimal_spec)
print(time.perf_counter() - start)
results
12.765044290106744
0.932626988273114
In our case we want to validate lots of swagger specs but its painfully slow to do so.
Sorry for the late reply.
I understand now your report and well, we did not optimise the flow for multiple or continuous validation os swagger specs.
Checking again the code-base I see that validator20.read_file
is invoked only to read the schemas stored in swagger_spec_validator/schemas/v#
and not user-provided swagger specs.
I'm mentioning this because caching forever the specs, with an unbounded cache, might have the side effect of growing the memory required by the python interpreter.
My recommendation would to have a lru_cache
with maxsize
defined (currently we have 12 files).
In order to keep backward compatibility we would need to have a new method that does the read (ie. read_from_schemas
or similar).
@brycedrennan will you have some bandwidth to provide a PR for this change?
@macisamuele PR submitted
resolved