Binder performance could be improved or maybe short circuited in some cases

Question

Binder performance could be improved or maybe short circuited in some cases

dsyer opened this issue 6 years ago · 10 comments

Spring Boot uses Binder.bind() at a very early stage in quite a lot
of places. SpringApplication itself, LoggingApplicationListener,
ConfigFileApplicationListener and AnsiOutputApplicationListener
all get used early and their combined use of Binder adds up to more
than 5% of the startup time in BuncApplication. If Binder could be
replaced with a simple call to Environment.getProperty() it would be
much faster - we tried this with ConfigFileApplicationListener with
positive results, but that caused test failures in Spring Boot, so the
binder is doing something clever that is necessary in at least some
corner cases.

Benchmark for extracting a String[] from an Environment property:

Benchmark                Mode  Cnt       Score       Error  Units
BinderBenchmark.binder  thrpt    5    5384.461 ±   359.367  ops/s
BinderBenchmark.direct  thrpt    5  791322.965 ± 22667.655  ops/s

The "direct" sample uses Environment.getProperty(), and the "binder
sample uses Binder.bind().

See also #13436.

Answer 1 · 2018-06-25T08:38:28.000Z

Here's another benchmark (different hardware, so slower for the same use cases):

Benchmark                Mode  Cnt       Score       Error  Units
BinderBenchmark.binder  thrpt    5    1942.687 ±   333.568  ops/s
BinderBenchmark.direct  thrpt    5  286815.982 ± 36887.052  ops/s
BinderBenchmark.map     thrpt    5   16381.371 ±  3743.830  ops/s

Here the "map" benchmark uses a MapConfigurationPropertySource instead of the full Environment (I noticed this pattern in XADataSourceAutoConfiguration). It's only 10x faster, but that's a step in the right direction. Maybe that's a hint about how to improve it, especially for the early bindings.

It's almost as fast if you just wrap the Environment in a ConfigurationPropertySource that simply mirrors Environment.getProperty():

Benchmark                 Mode  Cnt      Score     Error  Units
BinderBenchmark.wrapped  thrpt    5  15568.784 ± 580.370  ops/s

Answer 2 · 2018-06-25T13:31:56.000Z

More results where this time I short circuit ArrayBinder.bindAggregate() to just return the value directly if it finds one that matches the spec:

Benchmark                 Mode  Cnt       Score       Error  Units
BinderBenchmark.binder   thrpt    5    2889.135 ±    78.829  ops/s
BinderBenchmark.direct   thrpt    5  301019.875 ± 37135.369  ops/s
BinderBenchmark.map      thrpt    5   53207.785 ±  2309.443  ops/s
BinderBenchmark.wrapped  thrpt    5   42175.287 ±  2014.766  ops/s

This has boosted the performance of the "map" and "wrapped" benchmarks but the "binder" one is still hampered by something else, probably SpringConfigurationSource judging by the flame graphs (and the fact that it is not in the code path of the faster samples).

Answer 3 · 2018-06-25T13:57:22.000Z

Another round of experiments:

Benchmark                 Mode  Cnt       Score       Error  Units
BinderBenchmark.map      thrpt    5  103391.416 ±  2656.791  ops/s
BinderBenchmark.wrapped  thrpt    5   70869.648 ±  5895.043  ops/s

This time I short circuited BindConverter.convert if the source type is already assignable to the target type (i.e. there is no conversion to do).

Answer 4 · 2018-06-26T10:34:59.000Z

Comment out this line, where the binder spends a lot of time looping through system properties and environment variables, finding nothing:

Benchmark                 Mode  Cnt       Score       Error  Units
BinderBenchmark.binder   thrpt    5   34537.571 ±   613.386  ops/s

So, while commenting out that line obviously breaks something else, it wasn't needed for this use case and it speeds up the binder by a factor of 10. Maybe an optimization opportunity there.

Answer 5 · 2018-06-26T20:20:32.000Z

Here's another one. Not sure if this makes a functional difference, but BindConverter creates a new instance of TypeConverterConversionService for every instance, and it looks like maybe it doesn't need to if the propertyEditorInitializer is null. Opting out of that (if the initializer is null) speeds things up a bit as well:

Benchmark                 Mode  Cnt       Score       Error  Units
BinderBenchmark.binder   thrpt    5    3016.550 ±    97.695  ops/s
BinderBenchmark.direct   thrpt    5  299785.266 ± 60197.403  ops/s
BinderBenchmark.map      thrpt    5  163740.703 ± 25689.940  ops/s
BinderBenchmark.wrapped  thrpt    5   99965.290 ±  3146.096  ops/s

Answer 6 · 2018-06-29T11:37:56.000Z

BindConverter creates a new instance of TypeConverterConversionService for every instance, and it looks like maybe it doesn't need to if the propertyEditorInitializer is null

I think we can go a step further. Each time a Context is created, it creates the BindConverter using the conversion service and property editor initializer from the Binder. As far as I can tell the conversion service is always the same ApplicationConversionService instance and the property editor initializer is either null or the lambda from ConfigurationPropertiesBinder. In short, I think there's only a need for two different BindConverter instances to have been created.

With that said, I'm not sure how much we will gain. In a minimal app (just a dependency on spring-boot-starter-web) only 10 Context instances are created. I think we need to be wary of looking at microbenchmarks that focus heavily on the binder as they may not translate into much when launching an app.

Answer 7 · 2018-06-29T12:19:23.000Z

That's true. I did say in the beginning that I thought around 5% of startup time was being used up. If Binder was 10x faster that could be as little as 0.5%, so that would be worthwhile. Seems like the TypeConverterConversionService can be optimized easily, but won't get us the 10x (that's the value of the focused benchmarks).

Answer 8 · 2018-09-21T04:53:03.000Z

I've created a new issue (#14562) based on Andy's comment above.

Answer 9 · 2018-10-18T08:54:30.000Z

Things are much better now. It seems that Binder.bind() to a String[] is 2-3 times faster, and the "map" and "wrapped" benchmarks are 10x faster:

Benchmark                 Mode  Cnt       Score       Error  Units
BinderBenchmark.binder   thrpt    5    6035.631 ±   929.405  ops/s
BinderBenchmark.direct   thrpt    5  778681.617 ± 12206.544  ops/s
BinderBenchmark.map      thrpt    5  182403.289 ±  2743.654  ops/s
BinderBenchmark.wrapped  thrpt    5  122462.052 ±  3120.594  ops/s

The only place Binder shows up as a hotspot on startup of a vanilla empty Boot app now is in LoggingApplicationListener.setLogLevels() (about 2.5%). If we could squish that one it would be awesome - especially as 99% of the time there are 0 properties in logging.level.* and you still pay the cost.

Answer 10 · 2018-10-26T19:08:31.000Z

I've replaced some Stream calls with simple Iterators but I'm not sure there's much we can do for 2.1. Some unscientific System.nanoTime logging on the Tomcat sample shows LoggingApplicationListener.setLogLevels() takes about 4 milliseconds when there's no entries.

We can have another go in 2.2.