palacaze/sigslot

Compile-time and number of sections

NicolasLacombe opened this issue · 6 comments

Hi,

thanks for this library which is well conceived and very useful.

I'm on windows using Visual Studio 2017, and I did noticed though than when binding a signal to a lot of different lambda, you end up with quite slow compile-time, and potentially big *.obj file with an important number of symbols. So you rapidly need to raise the number of available symbols through the /bigobj command-line.

I looked a bit into it, and noticed that this seems to be related to the fact that slot are templated on callable. So every new lambda & callback passed to a signal end up generating a new specific slot, without any chance of reuse from the compiler. In the end this seems to generate quite a lot of symbols per lambda.

I guess this is not really an issue or a bug as the /bigobj command-line seems to be completely acceptable nowadays.

I don't really have any suggestion so far on how to improve this, it's probably quite challenging to abstract the type of a callable without using template, but I wanted to start the conversation and see what comes back.

Thanks,

Nicolas.

Hi Nicolas,

Thank you for mentioning that, I did not realize that behavior despite heavy use of sigslot in a few projects. Do you have a self-contained testcase that exhibits this problem? My design decisions where mostly directed towards simplicity and avoidance of undue techniques, such as type-erasure that are unneeded here. The side effect is increased template instantiation. If you often reuse the same types one option would be explicit template instantiation. Not sure if it would help though.

I am also glad it works on Visual Studio, as I did no test whatsoever on that compiler. Did you need to adjust the CMake configuration in any way?

Hi, thanks for the answers.

Yes, it worked out of the box for Visual Studio Version 15.9.7 2017.

I'v forked and added the example "lambdas".
I can't compile this example in Debug on Windows without the /bigobj options, and that's because the number of sections generated is > too 2^16. It's a simple example that only generate 400 lambdas of the same signature. Unfortunately, on visual studio, every lambdas will be a new class, even though there signature is the same. Don't know if Clang & GCC behave better at that, might be interesting to try.

I'v reached the limit in Release too with 1500+ connections, and the compilation took several minutes.

GCC 8 takes 30 seconds to compile the 400 lambdas example, the executable size is about 1.7 MB and 700 kB after stripping in Release mode. If I wrap every lambda in a std::function, the compile time drops to 9 seconds and the executable size to 300 kB before stripping. In Debug mode the compilation takes about as long but the executable is obviously larger.

Note that by definition, every lambda defines a unique symbol and thus underlying class, this is not MSVC specific and you can't get around this apart from using function objects, pointer over member functions or free functions to avoid defining loads of symbols.

My test with std::function was meant to prevent sigslot from instantiating several template functions and classes in the library. From this test I can conclude that defining 400 unique lambdas costs less than 9 seconds, and instantiating slots for 400 different lambdas costs about 20 seconds.

There are ways to mitigate template instantiation abuse. I will look into it but I anticipate few (if any) gains from it.

So, looking a bit more into it I realized that the main culprit is std::shared_ptr and not my own template code. std::make_shared instantiates a lot, this is quite the code bloat...

I pushed a workaround, off by default, that you can test by passing -DREDUCE_CODE_SIZE=ON to CMake. I would not call it a fix because there are tradeoffs. It actually makes #5 worse!

I keep the bug open because I think it can be improved further.

Cool stuff! Thanks a lot, tried briefly and can confirm that the compilation time and code size is also greatly reduced on Visual Studio.

Thanks for looking into it, this is a nice addition which is worth the price in my opinion.

I renamed the CMake configuration key and associated preprocessor macro to SIGSLOT_REDUCE_COMPILE_TIME. I had a hard time deciding whether to make it the default or not, and finally decided against it. I might change my mind later on, as it can affect compilation times drastically.

There may be other ways to decrease compilation times further, for instance by getting rid of std::vector and implementing a custom container of smart pointers, but I worry it would affect code readability so I won't pursue it further for now.

If you notice a regression along the way or in the future, do not hesitate to reopen this bug report.

Thank you for noticing this.