Parse performance and possible fix

Question

Parse performance and possible fix

tthmok opened this issue 13 years ago · 5 comments

So the thing I found was that in certain large JSON objects I was getting very bad performance. The JSON with the worst performance tended to be large because they had a lot of small objects with number values. Deepness of the nesting didn't seem to matter too much performance wise.

I did some dbx performance data collecting which led me to the 2 lines of code. The ones I changed in this gist https://gist.github.com/1124019#comments

Basically what is happening is that every time a part of the JSON turned out to be anything but a string (bool, null, number, array, object) those conditionals get checked. In those conditionals are wcslen calls which it looks like the author was trying to use to ensure that the strings were long enough to compare to "true", "false" and "null". I don't think these are needed because the wcsncompare calls should handle that for us, they don't seg faltt or error out if the given string is shorter than the n parameter. So by just removing those length calls we save a ton of work since length calls aren't cheap on large JSON.

I tested out the change for myself and it seems to be working but there may be edge cases that I'm not thinking about. Let me know what you think.

Answer 1 · 2011-08-04T08:33:47.000Z

I don't see any problems with removing the wcslen() call - as you say, the wcsncompare() would also be checking the length of the "string". I can't see any edge case issues...

Could you pastebin a sample large JSON that has poor performance in parsing and I'll add a test to check performance. What sort of performance gain are you seeing btw?

Answer 2 · 2011-08-04T14:44:28.000Z

Here's an example of the type of JSON I'm running against. http://pastebin.com/HTX7H8R9
It's a bit contrived since this particular JSON was just for testing purposes but it's still similar enough to real JSON that I'm using. Also my real JSON may or may not be flattened and have whitespace removed before the call to JSON::Parse but pastebin only supported up to 500kb so I removed all the whitespace.

In my code I put a little clock profile that timed just the line that calls JSON::Parse.

The original code takes about 17 seconds. After removing the length calls it drops to about 0.18 seconds.

Answer 3 · 2011-08-04T14:57:16.000Z

With that JSON on an i7 we go from ~9s to ~0.1s. I'll commit the change later today - there is also a wcslen() call in JSON.cpp when we decode escaped unicode characters which could do with being changed to a better method.

I'll update this issue when the change is committed

Answer 4 · 2011-08-04T16:31:22.000Z

From a little searching I see there's a wcsnlen (http://msdn.microsoft.com/en-us/library/z50ty2zh(v=vs.80).aspx) but it seems to not be standard for all C libraries. I think that would probably be the exact thing that's needed, get length up to a max n. It probably wouldn't be too hard to write such a function though.

Answer 5 · 2011-08-04T21:30:44.000Z

Thats the sort of thing I thought of. Seeing as it's not standard, I've come up with a simple implementation that does the same thing.

While benchmarking it on my mac mini, the JSON you provided took 12 minutes. Investigating that, was down to the wcsncasecmp() implementation in use for the OS X builds - that has also been refactored and now its parsing in ~0.2s on my mac mini.

Thanks again for the recommendation to speed things up - changes have been commited.