NVIDIAGameWorks/PhysX

Raycasts may fail after moving a lot of objects

francis-page opened this issue · 10 comments

Hi,

I've been investigating a low occurence bug where raycasts fail after moving a lot of objects around. I think I finally found what is causing it. When switching to the new AABBTree in AABBPruner::commit when a refit happened in the previous frame, the new AABBTree will contain outdated data. What happens:

In the frame where AABBPruner state is BUILD_LAST_FRAME:

  • Game code moves lots of objects around
  • AABBPruner::buildStep: nothing is done
  • AABBPruner::commit: refit is made in the current tree

In the next frame, where AABBPruner state is BUILD_FINISHED:

  • AABBPruner::commit: current tree is replaced by the new tree, but it misses the refit made in the previous frame

I tried to skip the BUILD_LAST_FRAME step and it seems to fix my issue. It's probably not the right fix as that step must have a reason to be.

I'll investigate ASAP.

Merci! (Oh... I remember browsing your Coder Corner with great interest in the early 2000s!)

I haven't been able to reproduce this so far. Are you doing anything out of the ordinary for this to happen?

I did various tests like in the attached video, with a lot of objects moving around, and a test raycast against a particular object. The blue lines visualize the AABB tree for the scene, and it's rebuilt approximately every 100 frames. The test raycast never misses the test object, even though it's moving quite fast so any missed "refit" operation on the BVH should be exposed by this.

I suppose you don't have an easy repro for me to look at?

Github550.mp4

Hi Pierre,

I attached a modified version of the Hello World snippet that will eventually reproduce the issue if you let it run. Just put a breakpoint on line 217 and it will stop when a ray cast fails. I also added this: printf("%p %d\n", this, mProgress); in AABBPruner::buildStep so I could follow the state of the pruners. If you add this, you will see that it always happens when the objects were moved when mProgress value was 5 (BUILD_LAST_FRAME). I also added some code in AABBPruner::commit to output the global bounding box of the current tree. The output when the bug occurs looks like this:

...
000001E01C928200 0
Moving objects to -4603.085938 1386.982300 -540.886414
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 3
000001E01C928200 0
000001E01C922890 4
000001E01C928200 0
Moving objects to -4359.113281 1271.340088 -536.499390
000001E01C922890 5
000001E01C922890 AABBPruner::commit switching trees, new world bounds X (-4603.135742..-4353.036133) Y (1386.477295..1387.487305) Z (-540.936401..-290.836548)
000001E01C928200 0

You can see that the new world's bounding box corresponds to the previous position of the objects and not to the new position that was set in the previous frame.

Note that I used static rigid bodies because that's what was used (sorry, I forgot to specify it). In the original project, the objects are never moved except when a specific event occurs and all the objects are moved somewhere else.

SnippetHelloWorld.zip

Ok, for PhysX 4 you will need to modify the code in 2 places:

if(mProgress==BUILD_NEW_MAPPING || mProgress==BUILD_FULL_REFIT)

if(mProgress == BUILD_NEW_MAPPING || mProgress == BUILD_FULL_REFIT)

Add the last part with BUILD_LAST_FRAME there:

if(mProgress==BUILD_NEW_MAPPING || mProgress==BUILD_FULL_REFIT || mProgress==BUILD_LAST_FRAME)

And that should fix the bug.

The bug happened because BUILD_LAST_FRAME was added a long time after the initial code was written, and we forgot to update these two lines.

It went undetected until now (including in my own repro attempt above) because the scene tree contains N objects per leaf node. So the last tested AABB is not actually the AABB around just one moved object, it's the AABB around a group of spatially close objects. As long as the group's bounds still enclose the previous position of the moved object, the bug doesn't appear. It only appears if you teleport everything far away from the previous positions, as in your repro.

Thanks for that one, it was not an obvious bug.

Thanks for that one, it was not an obvious bug.

Thanks to you! And yeah, it wasn't an easy one! It took me almost two weeks to get a proper repro and eventually isolate the cause.

Two weeks is not bad. According to the Perforce history, this bug has been introduced at some point between 2013 and 2014 (I couldn't find exactly when). And nobody ever noticed. So, congratulations :)

Will take this into Unity :-)

About this issue || mProgress==BUILD_FINISHED Is it necessary?

because in some places buildStep doesn't commit right away

void SceneQueryManager::sceneQueryBuildStep(PruningIndex::Enum index)
{
	PX_PROFILE_ZONE("SceneQuery.sceneQueryBuildStep", mScene.getContextId());

	if (mPrunerExt[index].pruner() && mPrunerExt[index].type() == PxPruningStructureType::eDYNAMIC_AABB_TREE)
	{
		const bool buildFinished = static_cast<AABBPruner*>(mPrunerExt[index].pruner())->buildStep(false);
		if(buildFinished)
		{
			mPrunerNeedsUpdating = true;
		}
	}
}

@PierreTerdiman Reference francis-page example,add the code to the sceneDesc Settings:

sceneDesc.sceneQueryUpdateMode = PxSceneQueryUpdateMode::eBUILD_ENABLED_COMMIT_DISABLED;

This problem still exists. because in buildStep,BUILD_FINISHED state not commit right away

void SceneQueryManager::afterSync(PxSceneQueryUpdateMode::Enum updateMode)
{
	PX_PROFILE_ZONE("Sim.sceneQueryBuildStep", mScene.getContextId());

	if(updateMode == PxSceneQueryUpdateMode::eBUILD_DISABLED_COMMIT_DISABLED)
	{
		mPrunerNeedsUpdating = true;
		return;
	}

	// flush user modified objects
	flushShapes();

	bool commit = updateMode == PxSceneQueryUpdateMode::eBUILD_ENABLED_COMMIT_ENABLED;

	for(PxU32 i = 0; i<2; i++)
	{
		if(mPrunerExt[i].pruner() && mPrunerExt[i].type() == PxPruningStructureType::eDYNAMIC_AABB_TREE)
			static_cast<AABBPruner*>(mPrunerExt[i].pruner())->buildStep(true);

		if(commit)
			mPrunerExt[i].pruner()->commit();
	}

	//If we didn't commit changes, then the next query must perform that operation so this bool must be set to true, otherwise there should be 
	//no outstanding work required by queries at this time
	mPrunerNeedsUpdating = !commit;
}