use fma for tolerant comparisons
moon-chilled opened this issue · 10 comments
The tolerant compare uses a calculation of the form a < b*c. This can be done faster and with more precision by looking at the sign bit of fma(b,c,-a). But it breaks with negative zero.
err, that should be fma(b,-c,a). fma(b,c,-a) is for <=. Because exact 0 or positive underflow will be positive 0; negative 0 can only be from negative underflow. (Assuming no negative 0 inputs.)
I think negative 0 is not an issue since it will occur on the fringe of the tolerant range.
You need to verify that this will work for infinities, which may produce NaN intermediate values.
It is vital that TEQ, which is used for singletons, be updated to track the code used in eqDD.
Tolerant a=b is
a>cct*b = b>cct*a
I want to replace it with:
signbit(cct*b-a) = signbit(cct*a-b)
But suppose a is _0, b is 0. Then we have:
signbit(cct*0-_0) = signbit(cct*_0-0)
signbit(0-_0) = signbit(_0-0)
signbit(0) = signbit(_0)
Which is false, but we want _0=0.
Good point about nan; looks like ieee doesn't specify the sign of a nan result. We want signbit(_+__) to be different from signbit(__+_); if we're lucky, x86 will spec it so the nan sign always comes from the right addend (or left; w/e). Will look later.
Can do something similar for the tolerant zero problem: replace x=y with (x<.y) = (x>.y). On x86, x>.y and x<.y are both y if x=y, so that ensures that, if both are zero, the arguments to the comparison will be the same zero. It's the same number of operations as the current implementation (replaces compares with max/min). (On arm, however, <. and >. consider 0 to be greater than _0, so this won't work there; bully for arm.)
if what's true?
_0 - 0 and _0 + _0 are both _0.
Seems x86 gives you -nan for both _+__ and __+_. I guess they want + to be commutative. Annoying.
Actually, it might be to count ulps. I think this is basically what tolerant comparison does, but need to review the definition and think it over.
nevermind, not ulp measure because it uses magnitude, not just exponent
But we can determine an upper and lower ulp bound (tolerant hashing basically derives a slightly coarse upper bound). If most comparands are either below the lower bound (definitely equal) or above the upper bound (definitely unequal), then it may be a win to branch to see if they're in between.
This just works (except for -0?) for comparands of opposite sign, but not for inf. Grumble.