std::fill performance issues under visual c++
Closed this issue · 7 comments
Hello, sorry for polluting the other thread: I just subscribed to github or this very issue and didn't realize I was replying to another issue :D
Here my original message.
Hello, great library!
I just got back to my old raytracing project (abandoned 10 years ago) and decided that it would have been nice to work on the performances, horrible at best. I've started replacing my quick BIH implementation with a BVH but decided to switch to an external lib and found this one. I could make it work, but performances were exactly the same as with my BIH:
bvh::LocallyOrderedClusteringBuilder<bvh::Bvh, int> builder(bvh);
builder.build(bvh::BoundingBox(MasterBVHPrimitive::toVector3(worldbb.min), MasterBVHPrimitive::toVector3(worldbb.max)), bboxes, centers, shapes.size());
bvh::LeafCollapserbvh::Bvh collapser(bvh);
collapser.collapse();
I've started profiling the execution and found that 35% of the time was being spent inside std::fill, and I could track it back to
struct Vector {
Scalar values[N];
Vector() = default;
bvh__always_inline__ Vector(Scalar s) { std::fill(values, values + N, s); }
As I'm working with 3-dimensional vectors, I then replaced the call to fill with:
values[0] = values[1] = values[2] = s;
My test scene rendering time went from 8 seconds to 3!
Am I using the lib the wrong way? Is there an option that would disable/improve that initialization? Would my change break something somewhere?
Thank you!
And here your reply:
@cignox1 Please create a separate issue for this, so as not to pollute this one. I suspect that you forgot to compile with optimizations on, or that your compiler is too old and effectively terrible at optimizing very simple code (see this for an example of how a decent compiler optimizes std::fill). Consider switching to gcc (Mingw64 if you're on Windows) or clang. If you use CMake, set CMAKE_BUILD_TYPE to Release.
I'm using latest ms compiler (the one shipped with Visual Studio 2019). Optimizations are "on" but I will look into it: I have not used VS since years...
It appears MSVC is just --- as usual --- a terrible compiler, and is for some reason not generating good code for that: https://godbolt.org/z/aW9c53 . I recommend to switch to clang and submit a bug report to Microsoft at this point. I could write a for
loop to do this instead, but there are probably other spots where MSVC is incapable of generating decent code. Besides, I can't just decide to downgrade the code quality because some people at Microsoft can't write decent compilers.
The right way forward if you want good performance is to switch to clang. There is an official build for MSVC: https://devblogs.microsoft.com/cppblog/clang-llvm-support-in-visual-studio/ . You'll get a standard C++ compliant compiler that generates decent code and is reliable. As an added bonus you might be able to get OpenMP 3.0 support which will enable the parallel algorithms in this library.
Hello, you were right! :D My project is an old one, and the optimization flag was set to Ox. Now O2 exists, much better, and this completely eliminates the problem :D
Thank you!
I'll definitely check clang tought, thank you for the tip!
I have noticed after some refactoring that some code was creating Vector
s instead of scalar values. Commit 60a0e01 fixes this. This should hopefully completely eliminate your problem, even in /Ox
.