GPU diverges from CPU on PTMaintLayer nmda computation
Opened this issue · 2 comments
even when using the same mat32.FastExp
implementation, the Gnmda values diverge between GPU and CPU.
Run the TestStdGPUnData1Debug
test in examples/boa (set TEST_DEBUG=true
to enable) to see these divergences. I verified in the GUI that the divergence emerges over cycles specifically in the Gnmda current (use Plot Unit value with cycle-level raster recording to see).
for testing, it would be great to fix this..
This is not due to FastExp -- added a compute test for this in vgpu, and it is identical.
old but informative: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
MoltenVK (Mac) has -fast-math enabled by default, unless somehow an individual shader indicates SignedZeroInfNanPreserve
execution mode has been specified. I can't seem to figure out how that might happen.
There is a property here saying whether supported: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkPhysicalDeviceFloatControlsPropertiesKHR.html
KhronosGroup/MoltenVK#1691 -- moltenvk supports it
just no hits anywhere about how to convince dxc to do it -- the -Gis option complains that it is not supported in spirv mode.
also don't see any obvious option in glslc.
giving up on this path for now.