NVIDIAGameWorks/PhysX

Crash in AABBTree::refitNode

karl-zylinski-ourmachinery opened this issue · 7 comments

Hi, we are getting a crash in AABBTree::refitNode, I am looking the callstacks and would like some advice on how to debug this further.

Callstack:

[Inline Frame] PhysX_64.dll!refitNode(physx::Sq::AABBTreeRuntimeNode *) Line 365    C++
PhysX_64.dll!physx::Sq::AABBTree::refitMarkedNodes(const physx::PxBounds3 * boxes) Line 535 C++
PhysX_64.dll!physx::Sq::ExtendedBucketPruner::refitMarkedNodes(const physx::PxBounds3 * boxes) Line 280 C++
PhysX_64.dll!physx::Sq::AABBPruner::refitUpdatedAndRemoved() Line 822   C++
PhysX_64.dll!physx::Sq::AABBPruner::commit() Line 425   C++
PhysX_64.dll!physx::Sq::SceneQueryManager::afterSync(physx::PxSceneQueryUpdateMode::Enum updateMode) Line 408   C++
PhysX_64.dll!physx::NpScene::fetchResultsPostContactCallbacks() Line 2119   C++
PhysX_64.dll!physx::NpScene::fetchResults(bool block, unsigned int * errorState) Line 2195  C++

In refitNode:
const PxU32* primitives = getPrimitives(indices, data); is outside the heap because data is 3722304989. What getPrimitives does is take indices + (data>>5). data>>5 is 116322030 and this makes the pointer jump far outside since mNbIndices (indices is fetched from mIndices in the frame above) is just 49986.

In my code I am removing and adding a large number of actors (based on the result of a procedural generation). When I started adding the actors using PruningStructures, to speed everything up, these crashes started happening. I have already fixed another crash related to my usage of PruningStructures here #540

Does anyone have an idea what goes wrong, or where I should look next?

Hi,
sorry to hear that you are having issues with the extended bucket pruner.
In the extendedBucketPruner class there should be:
bool ExtendedBucketPruner::checkValidity()

This is now called only in debug, I expect in your case debug would be too slow to check, but maybe you can enable these checks for CHECKED configuration and see if the validation code does identify what operation is causing this? It should check the validity for add/remove etc.

Ales

@AlesBorovicka Thanks for the speedy reply. I tried enabling the checkValidity() in my checked build. Unfortunately it does not complain about anything.

To me it seems like the data that comes from mData in some of the last nodes in mRuntimePool is badly initialized or not inited at all... The items fetched from mRuntimePool that had the bad mData was lying at index ~34000, which makes me think if something wasn't reinited properly when mIndices grew past 32768 items. However, I just had a similar crash, in that case the mData looked fine (mData>>5 was only 24), but in that case every single index in mIndices of the tree had the value 0xFFFFFFFF. Could something be going wrong when the tree grows?

Hmm this is a bit strange, in the checkValidity code, there should be a check for the primitives indexes so it in theory should be captured. If its easy to reproduce on your side, could you add the validation even before the refit is called? I mean the validation code was added to catch the state where the trees are invalid so it should be possible to call the validation from more places to make sure that you find the spot where the tree's get corrupted.

I tried this, but the validation didn't catch anything.

Hmm wonder what else to try. Based on the stack it seems that the refit is happening on one of the trees in the tree of trees that is in the extended bucket pruner. Which means that the tree should be never grow. It should hold the pruning structures and there is kind of no way how to add more stuff into the pruning structure which is one tree. You can remove objects from there, but that should just mark nodes as invalid, but it will not grow.
The only thing that grows is the main tree, but that seems to be not the case of the crash based on the stack if I understand the stack correctly.
Still a bit puzzled why the validation does not find this, since it suppose to check all the nodes in the trees, strange. Guess there is no easy way for me to try to repro this issue?

I thought a bit more about this. I am creating the shapes and actors and pruning structures on a separate thread, but I only add and remove prunings structures / actors to the scene on the same thread as physX update. What is the idea here? I assumed that I could create the pruning structure on a separate thread, even if physX update runs at the same time, since the creation of pruning structures can be quite CPU time consuming.

Yes, it should be possible to build this on a separate thread. Though indeed it could be one of the leads, that the AABBTree computation would not be thread safe, as the only thing that pruning structure does is the AABBTree computation, so same code that you see crashing.
Looking at the code, I dont see out of the box a reason why it should not be thread safe.

Can you eventually confirm that if you build the pruning structure on the same thread as the PhysX update the crash is gone?