`u32::max` / `u32::min` require `OpCapability U8`
Firestar99 opened this issue · 6 comments
u32::max / u32::min cannot be used as they require OpCapability U8. They should just work out of the box.
Workaround: use u32::clamp() instead.
It's the Ordering enum. it has #[repr(i8)] (with -1, 0, +1 values)
Maybe we should special case the Ordering enum to be a u32?
Wouldn't that mean it takes more space on devices that support OpCapability U8?
Used within a shader not much would change, the registers are (usually) 32bit anyway and you'd want to do the actual comparison directly afterwards anyway to reduce it to bools, which on many platforms have more compact storage. (eg. on AMD they represent bools by bits in a scalar register instead of using vector registers.)
The bigger issue is when the Ordering enum is written to or read from a buffer, as their representation between CPU and GPU would now differ and any struct containing Ordering as a member would me misaligned.
This would open up the discussion if we should automatically widen u8 and u16 to u32 for arbirary types that are used only within the shader, and whether to bitpack them automagically when loading / storing them to a buffer.
But honestly pretty much all desktop devices support U8 and U16 loads and stores anyway...
What about a way simpler solution: Overwrite the method {u8, u16, u32, ..., f32}::max and min to just do what f32::clamp does, which does not use the Ordinal enum at all:
pub fn clamp(mut self, min: f32, max: f32) -> f32 {
assert!(min <= max, "min > max, or either was NaN. min = {min:?}, max = {max:?}");
if self < min {
self = min;
}
if self > max {
self = max;
}
self
}What do we lose with that? I assume nothing?
Exactly, it would just make those common methods work without breaking anything else. Now everything using Ordinal directly or implementing Ord / PartialOrd would still fail to compile without OpCapability U8. But I think that's a fine tradeoff, as comparing custom types is rarely done on GPUs anyway, whereas min and maxing int / floats is used everywhere.
I assume the impl with Ordinal is there for a reason, so perhaps we could even switch back to that when OpCapability U8 is used.