Performance regression when using Handlebars with large documents in 4.3.1 release (cause: memory re-allocation behavior change in commons text 10)
aschwarte10 opened this issue · 4 comments
After upgrading from Handlebars from 4.3.0 to 4.3.1 we stumbled into a performance regression when parsing large documents.
The analysis revealed that this is a side effect of an actual bug fix in commons-text 10: in the org.apache.commons.text.TextStringBuilder
(which is used as a basis of FastStringWriter
) the ensureCapacity
method was fixes to not double the buffer size (on each re-allocation). The effect is that with each append through handlebars rendering heavy memory re-allocation takes place (which gets slower with increasing size of the buffer / document).
In our case we tried the following workarounds / solutions:
a) switch to regular Java StringWriter
Instead of com.github.jknack.handlebars.Template.apply(Object)
we now invoke the variant as follows
Context context = params instanceof Context ? (Context) params : Context.newContext(params);
StringWriter writer = new StringWriter();
template.apply(context, writer);
return writer.toString();
b) use a modified FastStringWriter
implementation that in each of its "append" methods invokes the following method to increase the buffer size (with some additional capacity)
/**
* Make sure that there is enough capacity in the buffer. If there isn't
* increase the capacity for the new content plus some additional space
* corresponding to {@link #BUFFER_SIZE}.
*
* @param newContentLength
*/
private void ensureCapacity(int newContentLength) {
if (buffer.capacity() - buffer.length() > newContentLength) {
return;
}
int currentSize = buffer.capacity();
// make sure to reallocate memory for the new content + some additional space
int desiredCapacity = currentSize + newContentLength + BUFFER_SIZE;
buffer.ensureCapacity(desiredCapacity);
}
We found that the plain StringWriter
was still better in performance in our concrete test case.
Maybe there are further optimizations in ensureCapacity
possible (i..e using the knowledge about the compile template and increasing the buffer size close to what is actually needed)
Quick addition to @aschwarte10 's comment above: the good results for the Java StringWriter are with JDK 17. There have been some changes between e.g. JDK 11 and 17 there, so the performance of the plain StringWriter might depend on the underlying Java version...
I think we may be seeing something similar... we've got a large template (4MB).
In a docker container running in Docker Desktop on my box, rendering it precompiled takes > 2,000 milliseconds.
On the same box but running the JVM outside docker, rendering it precompiled takes < 30 milliseconds.
Handlebars 4.3.1
Docker env:
openjdk version "17.0.7" 2023-04-18
OpenJDK Runtime Environment (build 17.0.7+7-alpine-r1)
OpenJDK 64-Bit Server VM (build 17.0.7+7-alpine-r1, mixed mode, sharing)
I'm surprised that it can be 2 orders of magnitude difference between in and out of a docker container... but taking a JFR recording it does look like a lot of the time is spent in reallocate:
java.util.Arrays.copyOf(Arrays.java:3633)
at org.apache.commons.text.TextStringBuilder.reallocate(TextStringBuilder.java:2610)
at org.apache.commons.text.TextStringBuilder.ensureCapacity(TextStringBuilder.java:1829)
at org.apache.commons.text.TextStringBuilder.append(TextStringBuilder.java:646)
at org.apache.commons.text.TextStringBuilder.append(TextStringBuilder.java:619)
at org.apache.commons.text.TextStringBuilder.append(TextStringBuilder.java:531)
at com.github.jknack.handlebars.internal.FastStringWriter.append(FastStringWriter.java:55)
at com.github.jknack.handlebars.internal.Variable.merge(Variable.java:148)
at com.github.jknack.handlebars.internal.BaseTemplate.apply(BaseTemplate.java:126)
at com.github.jknack.handlebars.internal.TemplateList.merge(TemplateList.java:95)
at com.github.jknack.handlebars.internal.BaseTemplate.apply(BaseTemplate.java:126)
at com.github.jknack.handlebars.internal.BaseTemplate.apply(BaseTemplate.java:114)
at com.github.jknack.handlebars.internal.ForwardingTemplate.apply(ForwardingTemplate.java:100)
I'm an idiot, I wasn't actually evaluating the template locally - when I did it had similar performance characteristics to the behaviour in Docker.
The issue is that the buffer char[]
is only 1,600 characters by default, and as you evaluate a template it only increases by the amount you can append to it each time. In our case we end up with a 4,000,000 chars, and it mostly increments by 10 to 50 characters, so we literally copy the data tens of thousands of times.
I can "fix" it by setting the system property -Dhbs.buffer=4200000
- but allocating a 4MB buffer for every template seems very wasteful when most are much much smaller. If the buffer doubled each time it was too small that might be a better solution? Or if it used the template size as a starting point?