Wrong behavior in mesh_distance when using parallelization methods other than OpenMP
lasagnaphil opened this issue · 3 comments
I had problems with strange results while using Intel TBB along with this library. The problem is in MeshDistance::distance()
and MeshDistance::signedDistance()
. This happens when the code is roughly like this:
auto trimesh = std::make_unique<Discregrid::MeshDistance>(...)
tbb::parallel_for(0, N, [&](size_t i) {
auto dist = trimesh->signedDistance(v); //race condition!
})
Although these functions are documented as thread-safe, it seems to be so only in the context of using OpenMP, since it relies on using the thread index from omp_get_thread_num()
to make sure that two different threads do not access the same mutable field (such as m_nearest_face). When using this function with other parallelization methods that do not rely on omp_get_thread_num()
, errors will likely happen (such as race conditions, or just plain invalid results being returned)
From delving into the code for MeshDistance, I'm actually confused why there are mutable fields indexed with omp_get_thread_num()
, it seems like these fields aren't really necessary for the algorithm to work. m_queues
actually seems redundant for the whole code, and m_nearest_face
isn't necessary when you can make it as a local variable. The only reason where these mutable fields are needed seems to be for the cached version of the methods (signedDistanceCached
, unsignedDistanceCached
), but these are still thread-unsafe because of the reason I've mentioned (reliance of OpenMP thread-ids)
In my opinion it would be best to remove the mutable fields, and move the caching functionality (which I assume was the main reason for making those mutable fields) outside of the MeshDistance class. This will allow the methods to be thread safe, while also allowing for OpenMP-safe caching if the user needs it.
Thank you very much for your detailed report. We are looking into this now.
Apologies for the delay. We reimplemented the signed distance to a triangle mesh from the ground up and published it in a separate repo TriangleMeshDistance. The new implementation solves the problem you had.
Again, thanks for your detailed report.
Best regards,
José.
Thanks for all the hard work! I was originally changing some things in my own fork to solve this bug along with other minor issues (I was also planning to remove the Eigen dependency myself but hadn't got the time to do so). Now that the distance calculation code is out as a separate header-only library, I think I'll switch to that one instead.
Sincerely,
Phillip Chang