for-GET/jesse

Performance issues

Closed this issue · 7 comments

sneis commented

I was looking at jesse's performance, with one or two dozen of schemas that I added with more or less arbitrary keys via jesse:add_schema("some key", SchemaData, [{parser_fun, fun jsx:decode/1}]).
When doing validation I found that (at least on my MacOS box with SSD), about 30% of the run-time is used in treating my arbitrary key as a relative file name. I found it's was a bit better on some other machines/OSes, but jesse_state:canonical_path consistently used a large part of the run time in my tests. Just making my "arbitrary keys" start with "http://" (so canonical_path uses string functions instead of filename functions) gave me a 30% speedup (run time for a single pass of my little test suite dropped from roughly 160 ms to 110 ms).
IMHO implicitly treating such arbitrary keys (that don't start with "file://") as files comes as a surprise to the user of jesse. Why not have that last clause of canonical_path just return the arbitrary key that was given to it, instead of forcibly treating it as a file name and "upgrading" it to a "file://" URI? Having to explicitly create an invalid URI like "http://some.name" and use that as a key, just to avoid that runtime penalty, feels really strange.

And then you just released 1.5.0, so of course I was curious about how that would affect performance. After a quick glance at the source code, I didn't have much hope that I would be freed from the need to prefix my keys with "http://" - and indeed, I wasn't.

Much worse, compared to the git version I used up to now (bd1c7e5), runtimes roughly doubled. I.e. I did all this profiling and testing to get the runtime of my testsuite down from 160 ms to 110 ms and after updating to the latest jesse release I'm now at 210 ms (and if I remove that hack with the "http://" prefix for testing, it's way over 300 ms). After doing some profiling on 1.5.0, I can't really point to a specific place in the code (like I could with jesse_state:canonical_path) and say that's where you are using up all that additional time, so for now I can only report those results to you and hope that maybe you can figure out how to speed things up again ...

Since you are possibly wondering why anyone would care about jesse's performance: Think of e.g. a cowboy server which receives JSON requests from clients, validates these requests and replies either with ok (and starts a job on a backend server) or with an error message about an invalid JSON. In that scenario jesse is really consuming a large chunk of the runtime on that frontend. And since I'm hoping for lots and lots of clients, jesse validation really becomes a bottleneck during traffic spikes.

sneis commented

Thanks for your reply. As a variant of 2, if an id is not an URI, you could maybe make it into an http-URI, instead of a file-URI? Anyway, after having found that "bottleneck" and the workaround of using a full (http-)URI, that's not a big concern for me any more. It's mainly the performance degradation which I see with 1.5.0 that's worrying me.

sneis commented

Yes, while "file:" is probably closer to the truth, it also seems slower. But actually, the difference I see between file scheme and http scheme is probably in the order of the measurement inaccuracy, so let's ignore that. The thing that's really expensive is the call to filename:absname in the last clause of jesse_state:canonical_path and prioritizing the http scheme for non-URIs would reasonably allow to just get rid of that expensive call (IMHO).

And your other question: Yes, I do see a performance degradation even with proper URIs when switching to 1.5.0. Even with proper URIs, runtime for my test goes up from around 110 to about 210 ms.

while I highlight that I still didn't get to this, there's no difference that I see code-wise between file: and http: https://github.com/for-GET/jesse/blob/master/src/jesse_state.erl#L331 . filename:absname/1 is only called for unknown scheme fallback.

Even with proper URIs, runtime for my test goes up from around 110 to about 210 ms.

weird. Thanks for the numbers!

Is it possible to get a tiny excerpt of your schemas? Just an example of how you load them (e.g. add_schema, load_schema, etc) and how the "id" and possibly "$ref" properties look like. I'm wondering if it's a specific code path that got performance degradation.

Hello again @sneis ! I only now took some time and look and this, and I don't see those numbers i.e. doubling the processing time. So I'm thinking maybe I don't have the "correct" setup that triggers this performance degradation.

Looking blindly at the diff related to this (canonical paths), I guess what stands out is this code flow https://github.com/for-GET/jesse/blob/master/src/jesse_state.erl#L350 but an re:split and a basic looping through the path items doesn't sound like the culprit.

@sneis if there's a way to highlight the issue, or even better - a PR to fix the issue, shoot! I'm closing the issue for now 🙋‍♂️