Optimize out toString from START/TEXT/END event IRule lookup

Question

Optimize out toString from START/TEXT/END event IRule lookup

Opened this issue 14 years ago · 5 comments

Right now every time there is a START, TEXT or END event from the underlying XMLPullParser, I do a toString on the internal location to see if there are any rules with locationPaths that match.

Using HPROF to profile the code it looks like a larger amount of memory and disturbing amount of CPU is allocated to creating and destroying all those temporary strings.

Answer 1 · 2011-03-10T20:15:02.000Z

Memory profiling shows roughly 84MB of memory getting allocated to String objects and their underlying char[] when running the Benchmark end-to-end.

CPU profiling shows 82% of CPU time spent calling toString on the location inside of the doXXX methods.

Answer 2 · 2011-03-10T20:19:11.000Z

ORIGINAL DESIGN

For simplicity sake, the original design of SJXP used the Strings representing rule's locationPaths as well as the parser's current state as keys for hash lookups to see if there were any matches.

This requires a toString call on the StringBuilder used to represent the current parser path 4x on every tag.

As seen with the profiled parser data, this lead to an enormous amount of waste, even though SJXP was still performing fairly fast, it wasn't immediately obvious.

NEW DESIGN

To reduce the amount of memory and CPU waste, the following design will be employed.

The internal "Location" class used to represent the parser's location will be modified to have a hashCode() method that uses the identical calculation to String's hashCode; this way a Location with the same characters (path) of a matching IRule will have identical hashCode values.
IRule's will be stored in their hash lookups using the hashCode() of their locationPath's instead of the String locationPath's themselves.

Lookups for matching IRules based on the parser's location will be done using Integer hashCodes instead of needing to toString() the entire location every time.

Answer 3 · 2011-03-10T20:50:02.000Z

Implementing this change dropped memory usage 50% and CPU usage significantly, unfortunately a ton of time is now spent inside of the Integer.valueOf() method auto-boxing the int returned from hashCode.

IMPROVED DESIGN

Keeping the same design, but instead implementing an Integer hashCode cache that is a simple Integer[] of a decent size (it's size only needs to match the number of unique paths that exist in the document... probably never more than 64, so default is 512 to be more than adequate)

Whenever an Integer for the hashCode is requested, the hash code cache is checked at the index of the hash code value modulo to cache length (Absolute value), if null is found, an Integer to represent that hashcode value is created, stored and returned.

We make sure we don't get a collision of values and either return the cached version or create a new instance of Integer.

This way we never create more than hashCodeCache.length number of Integers regardless of how huge the document is.

Answer 4 · 2011-03-10T20:54:34.000Z

Results

Profiling the new design and I see the following:

HUGE decrease in memory usage/object creation. The top 3 usages of memory are all directly tied to the underlying XML Pull Parser and no longer occupied by SJXP. The memory usage of SJXP has been decreased to an insignificant amount ontop of the base parser.

Benchmarking the "Benchmark" class with HPROF went from 85MB of memory created by SJXP down to around 1MB.

CPU benchmark results show similar improvements.

The overhead of SJXP is razor thin now with 1 other optimization to make it thinner on it's way :)

Answer 5 · 2011-03-10T20:58:45.000Z

Benchmark results after the optimization:

Processed 10845 bytes, parsed 60 XML elements in 8ms
Processed 130641 bytes, parsed 258 XML elements in 35ms
Processed 272934 bytes, parsed 200 XML elements in 57ms
Processed 301834 bytes, parsed 50 XML elements in 28ms
Processed 722362 bytes, parsed 300 XML elements in 49ms
Processed 1633334 bytes, parsed 2108 XML elements in 252ms
Processed 10625983 bytes, parsed 41427 XML elements in 234ms