Binder performance could be improved or maybe short circuited in some cases
dsyer opened this issue · 10 comments
Spring Boot uses Binder.bind()
at a very early stage in quite a lot
of places. SpringApplication
itself, LoggingApplicationListener
,
ConfigFileApplicationListener
and AnsiOutputApplicationListener
all get used early and their combined use of Binder
adds up to more
than 5% of the startup time in BuncApplication
. If Binder
could be
replaced with a simple call to Environment.getProperty()
it would be
much faster - we tried this with ConfigFileApplicationListener
with
positive results, but that caused test failures in Spring Boot, so the
binder is doing something clever that is necessary in at least some
corner cases.
Benchmark for extracting a String[]
from an Environment
property:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 5384.461 ± 359.367 ops/s
BinderBenchmark.direct thrpt 5 791322.965 ± 22667.655 ops/s
The "direct" sample uses Environment.getProperty()
, and the "binder
sample uses Binder.bind()
.
See also #13436.
Here's another benchmark (different hardware, so slower for the same use cases):
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 1942.687 ± 333.568 ops/s
BinderBenchmark.direct thrpt 5 286815.982 ± 36887.052 ops/s
BinderBenchmark.map thrpt 5 16381.371 ± 3743.830 ops/s
Here the "map" benchmark uses a MapConfigurationPropertySource
instead of the full Environment
(I noticed this pattern in XADataSourceAutoConfiguration
). It's only 10x faster, but that's a step in the right direction. Maybe that's a hint about how to improve it, especially for the early bindings.
It's almost as fast if you just wrap the Environment
in a ConfigurationPropertySource
that simply mirrors Environment.getProperty()
:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.wrapped thrpt 5 15568.784 ± 580.370 ops/s
More results where this time I short circuit ArrayBinder.bindAggregate()
to just return the value directly if it finds one that matches the spec:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 2889.135 ± 78.829 ops/s
BinderBenchmark.direct thrpt 5 301019.875 ± 37135.369 ops/s
BinderBenchmark.map thrpt 5 53207.785 ± 2309.443 ops/s
BinderBenchmark.wrapped thrpt 5 42175.287 ± 2014.766 ops/s
This has boosted the performance of the "map" and "wrapped" benchmarks but the "binder" one is still hampered by something else, probably SpringConfigurationSource
judging by the flame graphs (and the fact that it is not in the code path of the faster samples).
Another round of experiments:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.map thrpt 5 103391.416 ± 2656.791 ops/s
BinderBenchmark.wrapped thrpt 5 70869.648 ± 5895.043 ops/s
This time I short circuited BindConverter.convert
if the source type is already assignable to the target type (i.e. there is no conversion to do).
Comment out this line, where the binder spends a lot of time looping through system properties and environment variables, finding nothing:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 34537.571 ± 613.386 ops/s
So, while commenting out that line obviously breaks something else, it wasn't needed for this use case and it speeds up the binder by a factor of 10. Maybe an optimization opportunity there.
Here's another one. Not sure if this makes a functional difference, but BindConverter
creates a new instance of TypeConverterConversionService
for every instance, and it looks like maybe it doesn't need to if the propertyEditorInitializer
is null. Opting out of that (if the initializer is null) speeds things up a bit as well:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 3016.550 ± 97.695 ops/s
BinderBenchmark.direct thrpt 5 299785.266 ± 60197.403 ops/s
BinderBenchmark.map thrpt 5 163740.703 ± 25689.940 ops/s
BinderBenchmark.wrapped thrpt 5 99965.290 ± 3146.096 ops/s
BindConverter
creates a new instance ofTypeConverterConversionService
for every instance, and it looks like maybe it doesn't need to if thepropertyEditorInitializer
is null
I think we can go a step further. Each time a Context
is created, it creates the BindConverter
using the conversion service and property editor initializer from the Binder
. As far as I can tell the conversion service is always the same ApplicationConversionService
instance and the property editor initializer is either null
or the lambda from ConfigurationPropertiesBinder
. In short, I think there's only a need for two different BindConverter
instances to have been created.
With that said, I'm not sure how much we will gain. In a minimal app (just a dependency on spring-boot-starter-web
) only 10 Context
instances are created. I think we need to be wary of looking at microbenchmarks that focus heavily on the binder as they may not translate into much when launching an app.
That's true. I did say in the beginning that I thought around 5% of startup time was being used up. If Binder
was 10x faster that could be as little as 0.5%
, so that would be worthwhile. Seems like the TypeConverterConversionService
can be optimized easily, but won't get us the 10x (that's the value of the focused benchmarks).
Things are much better now. It seems that Binder.bind()
to a String[]
is 2-3 times faster, and the "map" and "wrapped" benchmarks are 10x faster:
Benchmark Mode Cnt Score Error Units
BinderBenchmark.binder thrpt 5 6035.631 ± 929.405 ops/s
BinderBenchmark.direct thrpt 5 778681.617 ± 12206.544 ops/s
BinderBenchmark.map thrpt 5 182403.289 ± 2743.654 ops/s
BinderBenchmark.wrapped thrpt 5 122462.052 ± 3120.594 ops/s
The only place Binder
shows up as a hotspot on startup of a vanilla empty Boot app now is in LoggingApplicationListener.setLogLevels()
(about 2.5%). If we could squish that one it would be awesome - especially as 99% of the time there are 0 properties in logging.level.*
and you still pay the cost.
I've replaced some Stream
calls with simple Iterators
but I'm not sure there's much we can do for 2.1. Some unscientific System.nanoTime
logging on the Tomcat sample shows LoggingApplicationListener.setLogLevels()
takes about 4 milliseconds when there's no entries.
We can have another go in 2.2.