Random shuffling isn't thread-safe and may cause issues with the linker
here-abarany opened this issue · 9 comments
The current random shuffling that's enabled by default declares a static mt19937 randGenerator(9001)
in Triangulation.h. By declaring the variable as static
, this has two main effects:
- The variable will be shared across threads. This means that performing multiple Delaunay operations across different threads will interfere with the shuffling, and in worst case could break the random number distribution in the generator as the state is updated unsafely.
- A separate instance will be created in each .cpp file that includes Triangulation.h. If this happens in multiple .cpp files in a library, I'm not quite sure what would happen when the linker removes duplicate template instantiations that refer to the
randGenerator
object.
One option is to avoid a global object and create a stack object instead. Given that you're re-seeding randGenerator
each time this would have the same performance. randGenerator
will be ~2.5 KB for its internal state, but this cost could be mitigated by moving the instance to the random_shuffle()
function so it's popped off the stack before any recursive operations occur.
A couple other things worth noting:
- The usage of
randGenerator() % (i + 1)
introduces bias. For this usage it probably doesn't really matter, but uniform_int_distribution would avoid any biasing. The implementation may change between systems, though, so an argument could be made to leave it as-is if you wish to keep the behavior truly repeatable. (or use one of these simple implementations, the "Debiased Modulo (Once)" method seems to be a good compromise between simplicity and speed) - Initializing Mersenne twister is quite expensive. I could see this constant cost becoming a factor for someone performing Delaunay on many small meshes. The available list of generators in the STL is unfortunately a bit behind the times, but today it's possible to have a non-garbage generator without the massive state of Mersenne twister. Some options are analyzed on this page, and given that you're only using a single seed for one operation SplitMix64 would be trivial to integrate and should have well distributed results while only using a single 64-bit number.
Thanks for opening the issue @here-abarany. I will take a closer look when I’m back at the keyboard.
@artem-ogre what are your plans for the changes on the 97-use-splitmix64-prng branch? I noticed that some of the tests are failing, presumably the "ground truth" tests are sensitive to the different shuffle order. Is it just a matter of finding the time to fix up the tests and any remaining cleanup, or are there other concerns that need to be investigated first?
@artem-ogre presumably the "ground truth" tests are sensitive to the different shuffle order
yes, in situations of polygons formed by concyclic points different vertex insertion order can produce different (but still correct) triangulation. Easy way to fix tests is to visually verify that triangulations are still correct and then regenerate the ground truth. But I had in mind solving this by detecting concyclic polygons and producing a unique triangulation. This would make tests less fragile. But this is a significant effort I did not find time for yet.
Does this mean that using randomized vs. unrandomized inputs should give the same results? If so that's definitely something we'd be very interested in.
Hi, I noticed that some work was done on this with the new-insertion-sequence branch. Do you have any updates on the status of that and when you might think it could be considered "done" and merged into master?
@here-abarany
Yes, the branch is almost ready, but I still need to test and investigate some corner-cases that are different.
Random generator change is a part of this branch.
Unfortunately, I have very little spare time to dedicate to CDT. I can't commit on any deadline, but I try my best.
Cool, thanks for the update.
Related: #112
Fixed.