tom-seddon/b2

BBCMicro_Update.cpp takes too long to compile

Closed this issue · 5 comments

Xcode takes 3.5 minutes even for a debug build! The VC++ build time is OK for a debug build, though it's still noticeable.

The basic approach of having endless customized versions of the update function should ideally stay, but iteration time is also important.

The Xcode indexing might be contributing to the actual times mentioned here. A ninja build is more like 2.5 minutes. From 2 separate runs after touch'ing BBCMicro_Update.cpp:

  • ninja 156.81s user 17.52s system 111% cpu 2:36.61 total
  • ninja 152.16s user 16.12s system 114% cpu 2:26.37 total

The UPDATE1 macro relies on the compiler detecting repeated identical instantiations. VC++ seems to do this, with the build time seeming proportional the number of non-identical instantiations rather than the table size (though this is just my gut impression rather than something I have measured). Is clang doing that? What if they're eliminated?

One alternative that would test this: (to be combined with a runtime fixup step on startup, that would fill the null entries with the appropriate function pointer for the corresponding normalized flags)

#define UPDATE1(N) (::GetNormalizedBBCMicroUpdateFlags(N)==(N)?&BBCMicro::UpdateTemplated<(N)>:nullptr)

But that did not really improve things (to say the least):

ninja  459.32s user 25.19s system 103% cpu 7:48.93 total

This seems a bit ridiculous. Even if constexpr evaluation is slow, the number of evaluations should be the same. Maybe the compiler is generating a huge pile of code to do the table initialisation at runtime now?

To be continued.

Simpler repro case for my laptop:

time /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_TYPE_Debug=1 -D_GNU_SOURCE -I/Users/tom/github/b2/src/beeb/include -I/Users/tom/github/b2/src/shared/h -I/Users/tom/github/b2/src/6502/h -I/Users/tom/github/b2/submodules/salieri -g -DASSERT_ENABLED=1 -std=gnu++17 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -mmacosx-version-min=12.7   -Wall -Wuninitialized -Winit-self -pedantic -Wsign-conversion -Wunused-result -Wno-invalid-offsetof -Werror=incompatible-pointer-types -Werror=int-conversion -Werror=return-type -Wno-pragma-pack -Wsign-compare -Wsign-conversion -Wsometimes-uninitialized -Wno-overlength-strings -Wunused-parameter -MD -MT src/beeb/CMakeFiles/beeb_lib.dir/src/BBCMicro_Update.cpp.o -MF src/beeb/CMakeFiles/beeb_lib.dir/src/BBCMicro_Update.cpp.o.d -o src/beeb/CMakeFiles/beeb_lib.dir/src/BBCMicro_Update.cpp.o -c /Users/tom/github/b2/src/beeb/src/BBCMicro_Update.cpp
  • 132.17s user 9.14s system 94% cpu 2:29.49 total
  • 129.13s user 8.73s system 96% cpu 2:22.14 total

This kind of thing didn't make any obvious difference. Which of course it wouldn't really, as the compiler has to detect duplicated instantiations, because templates wouldn't work very effectively if it didn't.

-#define UPDATE1(N) &BBCMicro::UpdateTemplated<::GetNormalizedBBCMicroUpdateFlags(N)>
+#define UPDATE1(N) &BBCMicro::UpdateTemplated<(N)==::GetNormalizedBBCMicroUpdateFlags(N)?(N):INVALID_UPDATE_FLAGS>

+template <>
+uint32_t BBCMicro::UpdateTemplated<INVALID_UPDATE_FLAGS>(VideoDataUnit *, SoundDataUnit *) {
+    return 0;
+}
+

Splitting the update function instantations across 4 files makes a useful improvement on Windows and macOS - good enough for me.

The build time still sucks with gcc on Linux.