emer/axon

GPU diverges from CPU on PTMaintLayer nmda computation

Opened this issue · 2 comments

even when using the same mat32.FastExp implementation, the Gnmda values diverge between GPU and CPU.

Run the TestStdGPUnData1Debug test in examples/boa (set TEST_DEBUG=true to enable) to see these divergences. I verified in the GUI that the divergence emerges over cycles specifically in the Gnmda current (use Plot Unit value with cycle-level raster recording to see).

for testing, it would be great to fix this..

This is not due to FastExp -- added a compute test for this in vgpu, and it is identical.

old but informative: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

MoltenVK (Mac) has -fast-math enabled by default, unless somehow an individual shader indicates SignedZeroInfNanPreserve execution mode has been specified. I can't seem to figure out how that might happen.

There is a property here saying whether supported: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkPhysicalDeviceFloatControlsPropertiesKHR.html

KhronosGroup/MoltenVK#1691 -- moltenvk supports it

just no hits anywhere about how to convince dxc to do it -- the -Gis option complains that it is not supported in spirv mode.

also don't see any obvious option in glslc.

giving up on this path for now.