NVIDIAGameWorks/PhysX

Raycasts fail for articulations after a while

jankrassnigg opened this issue · 40 comments

I have set up a ragdoll using articulations and I am observing rather weird behavior with raycasts. Just after a while of activity, parts of an articulation stop being hit by raycasts. It still interacts with other actors and can be pushed by a character controller, but raycasts just don't hit those parts anymore, as if suddenly the collision filter changed.

Here is a video (zipped because apparently I can upload larger zips than mp4s):

Raycasts through Articulation.zip

After a while the top beam stops hitting the articulation, for no apparent reason. Then the gun also doesn't hit those parts anymore, though it still hits the bottom half of the articulation.

I'm also attaching the corresponding PVD recording, though since articulations aren't supported that well by it, I don't see any useful data in it.

Articulation Raycasts.zip

Btw it is even worse once I apply large-ish impulses. By targeting the smaller shapes (here the legs and feet), apparently I can disable raycasts on the entire articulation.

This video shows the effect:

Articulation Impulses.zip

Are these using PxArticulation or PxArticulationReducedCoordinate?

The former

Haven't seen that before. Raycasts are supposed to work against articulations, even after a few simulation steps. Here's a random example: https://www.youtube.com/watch?v=DQ-cXDZJvek

You're right that the PVD capture isn't very useful. Could you perhaps try debug visualization? In particular for shapes and for scene query structures, to confirm that everything looks normal on that front.

As far as I can tell the public PVD was never updated to fully display all the articulation details. E.g. I had to do custom debug display for my joint setups to get my ragdolls working right, which would have been possible with PVD for regular bodies.

Anyway, what exactly do you mean with "try debug visualization" ? Is that some flag I need to set, or is there a debug draw interface in PhysX that I'm not aware of ?

I've used PxVisualizationParameter::eCOLLISION_SHAPES. I don't see a flag for visualizing scene queries. If there are other flags I should try, let me know.

Here's the video:
debug vis.zip

PxVisualizationParameter::eCOLLISION_DYNAMIC would visualize the pruning structure for dynamic objects. We might be able to visually check whether the BVH contains these articulation links.

image

Between the point where the upper beam hits the object and then goes through it, there was no change in the blue boxes.

This looks fine to me. So no idea what's happening here. I guess you don't have a simple repro we could look at?

Haha, "simple" and "PxArticulations" really don't go into the same sentence :D
I don't think I'd have the time for creating a stand-alone repo case.

You could

  • Clone ezEngine
  • Check out branch "user/jk/physx-repo"
  • Make sure to check out all submodules
  • Run the top-level "GenerateWin64vs2019.bat" file
  • Open the sln file in the Workspace folder
  • Build everything
  • Run "Editor"
  • Open "Testing Chambers" sample project from the dashboard
  • Open "Empty" scene
  • Press Transform All Assets in the asset browser panel (need this once, since you'd have a clean checkout)
  • Press the Simulate button in the toolbar
  • Use the CVars panel in the editor to set "Physics.Debug.Draw" to > 0

Then you'll have the result as shown above.

Let me know whether that's an option for you.

Cannot have a look right now but one last question: did you try disabling your filter data (PxQueryFilterData) to confirm that the problem doesn't come form there?

Managed to build it. How can I enable regular rendering of these ragdoll links?

Any idea why it doesn't reach my breakpoints in ezPhysXWorldModule::Raycast on this scene? (Running in Debug with F5 of course).

Most likely because it's in a child process. Either manually attach to the "EditorEngineProcess" or use the ChildProcessDebugging extension for visual studio.

What do you mean with "regular rendering of these ragdoll links" ? Do you mean the PhysX visualization ?

"did you try disabling your filter data " I just tried something (though not sure it's what you meant), doesn't make a difference, though. In general there's no reason why any of the filters should be modified, neither on the ragdoll nor on the raycast.

Currently the material of the ragdoll mesh is fully transparent, you can fix it like this:

image

  • managed to debug the child process
  • disabled filter data, no luck
  • couldn't trace the raycast call so I recompiled 4.1.2 locally, replaced your libs/DLLs with mine, and now I can trace the code
  • but the behavior is different with my recompiled libs, the raycast doesn't stop like with yours
  • what I see however is still strange, the raycast hit (or at least the end of the beam) doesn't seem to smoothly follow the ragdoll's motion, it looks like it suddenly "snaps" from one discrete location to another. Something's definitely weird, still not sure what.

Is there a way to pause the simulation without resetting it?

Just checked the code, the snapping comes only from the beam render component, it updates the graphics mesh only with a limited precision.
You could go to ezRaycastComponent::Update(), and uncomment lines 197+198 then you should get pixel perfect debug rendering, if you need that.

couldn't trace the raycast call

it's possible that we don't ship the PDBs, we have to be very careful with how much data we put into GitHub

but the behavior is different with my recompiled libs, the raycast doesn't stop like with yours

You mean with your libs all rays go straight through ALL THE TIME ? That would be rather odd indeed.

There is no way to pause the simulation, you can go to Scene > Simulation Speed and set it down to 10%. Not sure whether that helps in any way, though. Why would you want to fully pause it? I can have a look whether there is a quick way to hack that in.

Are you talking about these lines?

ezDebugRenderer::Line lines[] = { {rayStartPosition, rayStartPosition + rayDir * m_fMaxDistance} };
ezDebugRenderer::DrawLines(GetWorld(), lines, ezColor::RebeccaPurple);

Even if I put them back the hit position is far from ok. In attached video for example, the ragdoll is clearly moving but the impact position (of the top beam for example) remains fixed (and then later snaps to a different position).

Testing.Chambers.-.ezEditor.2022-01-20.21-07-23.mp4

You mean with your libs all rays go straight through ALL THE TIME ?

No it works "better" with replaced libs. The ray hits "something" until it makes sense that it doesn't. The snapping is still here though. (I have a video but it's too big)

Regarding pausing the simulation, add this code:

ezCVarBool cvar_PhysicsPause("Physics.Pause", false, ezCVarFlags::None, "Pauses the physics simulation");

void ezPhysXWorldModule::StartSimulation(const ezWorldModule::UpdateContext& context)
{
  if (cvar_PhysicsPause)
    return;

Then you can pause the simulation from the cvar panel.

Couldn't capture the full sequence and keep it under 100 Mb but here's the end of it, until the top beam doesn't hit the ragdoll anymore. This is with replaced libs. No idea where the difference comes from, I used our "4.1.2 release branch" from Perforce, should the the same as on Github.

Testing.Chambers.-.ezEditor.2022-01-20.21-16-32.mp4

Yes, those green lines are beam components and since in ezBeamComponent::CreateMeshes() it builds a string from the coordinates with fixed precision, I highly suspect that's the reason for the snapping (though it should be 1cm precision, not sure, the artifacts look like it's more).

Since you have additionally enabled the debug line rendering now, you could disable the beam component, so that it doesn't get in the way:

image

So your video makes sense. Oh right, one more data point: We don't actually use debug libs of PhysX, because with articulations they are unbearably slow. We use a checked build that has slight modifications so that it links with our debug build (see here: https://github.com/ezEngine/thirdparty-physx#debug). Not sure whether that behaves more buggy ??

Hmmm I compiled the debug config then replaced the debug folder here:

\ezEngine\Workspace\vs2019x64\PhysX\win.x86_64.vc142.md

...with mine. I changed from /MT to /MD in Visual Studio because our libs compile with static CRT by default.

I diffed the include folders, they're all the same (yours & mine) so I didn't touch that.

I recompiled, and it worked. Debug might be too slow generally but just for this single ragdoll it's ok.

Either way the behavior of these two builds is different.

Ok I have something. Put a breakpoint in ezPxErrorCallback::reportError, run your test, wait until you hit the breakpoint with an error message about an invalid pose :)

image

Damn! I specifically filtered those out (about a year ago) because I got those errors all the time (and then forgot about it).

Well, it explains why the raycast fails, it doesn't explain where the invalid pose comes from, since this is just running the simulation. And with your libs it seems not to generate invalid poses.... though I guess there must be something in PhysX that generates NaNs more easily in checked and release builds than in debug builds?

Btw. I've pushed a small update to the branch for easier debugging.

Debug might be too slow generally but just for this single ragdoll it's ok.

Exactly. see #442 ;-)

Yeah that explains the raycast failure but not what the root of the problem is. Something somewhere is generating NaNs even though visually it all looks ok (the ragdoll doesn't explode, bounds look fine, etc). It's a bit weird because the "pose" (transform) would be the thing used to derive both the render transform and the bounds as well, so I cannot explain why it would be ok for these but not ok for raycasts. Generally speaking Release builds have less protections against NaNs but Debug & Checked should be fairly similar in that respect.

In any case now we should double-check all the link parameters, masses, inertia tensors, etc, in search of something fishy.

And beyond that.... I have to point out that these "maximal coordinate articulations" are gone in PhysX 5 so maybe it would be best to focus either on regular joints or reduced-coordinate articulations anyway. (Just saying).

Interesting, I thought the reduced-coordinate articulations are targeted towards robotics. I used PxArticulation because they gave quite stable results with high performance, whereas my feeling is that with regular joints it is much more difficult to set up stable ragdolls. Though I am a bit unhappy that I can't switch bones to kinematic.

So for a games scenario, what would you suggest. Switch to reduced-coordinates (what are the trade-offs?) or standard (6DOF?) joints?

Regarding the invalid pose, I had other situations where PhysX was checking for normalized vectors with a very low epsilon, that sometimes made problems. Maybe the same thing happens here, just internally, and thus the raycast skips the ragdoll, even though the data is generally fine.

Also, just wanted to say thank you for the big help!

Yes the RC articulations were initially for robotics but they improved a lot in PhysX 5, to the point that the other articulations became redundant. But fair enough, in PhysX 4 the "old" articulations remain a good choice for ragdolls.

Yes the failing function does 3 tests:

	PX_CUDA_CALLABLE bool isValid() const
	{
		return p.isFinite() && q.isFinite() && q.isUnit();
	}

isFinite() would catch NaNs, but isUnit() would fail just for non-unit quats. That might be what's happening here. The epsilon in isUnit() is 1e-4f.

I spent some time disabling parts of the ragdoll setup (limits, compliance, etc) to pinpoint the source of the problem but no luck so far.

That's very useful to know, thank you! For PhysX 4, maybe I'll give regular bodies+joints a try, even just out of curiosity for whether the added flexibility has any use to me. I particularly would like to have a ragdoll in kinematic mode that I can switch to simulated on the fly, maybe even just partially. Currently I achieve hit boxes with a separate setup of kinematic query shapes, and replace it by an articulation when needed, which seems to work as well, but is a bit less convenient.

I'm not going to ask when PhysX 5 will be available to the general public ;-)

Considering that the articulation generally works fine, I would really bet that the q.isUnit() check is the issue. It would make sense that a release build can introduce enough precision loss that that fails, I've seen that countless times in other projects. Even compiler updates tend to break our unit tests regularly due to precision loss. I'd guess that if you just remove that check (or crank up the epsilon) it will start working again.

Yeah I wanted to remove the check but I'm having difficulties replacing the libs with a checked build (remember that the bug didn't happen with my replaced debug build). If I switch to checked builds I start getting compile errors about mismatched CRT libs and/or this:

#error Exactly one of NDEBUG and _DEBUG needs to be defined!

If you compile EZ in 'Dev' a regular checked build should work (the Dev build still has debug symbols enabled). If you do need a Debug build, here is what I did to make it work: https://github.com/ezEngine/thirdparty-physx#debug

Ok, confirmed. Managed to switch to checked builds, which reproed the problem with my libs.

Removing the isUnit() test makes the problem go away.

It also works if I increase the epsilon from 1e-4 to 1e-3:

	PX_CUDA_CALLABLE bool isUnit() const
	{
		//const float unitTolerance = 1e-4f;
		const float unitTolerance = 1e-3f;
		return isFinite() && PxAbs(magnitude() - 1) < unitTolerance;
	}

It's interesting that this function also exists with a different / larger epsilon:

	/**
	\brief returns true if finite and magnitude is reasonably close to unit to allow for some accumulation of error vs
	isValid
	*/
	PX_CUDA_CALLABLE bool isSane() const
	{
		const float unitTolerance = 1e-2f;
		return isFinite() && PxAbs(magnitude() - 1) < unitTolerance;
	}

I'll ask around if anybody remembers the story there but I'll increase that epsilon in isUnit(), in any case.

I guess we can consider the issue as closed then, at least from my side I'm happy enough.