Capability is required unexpectedly for constant cast
Opened this issue ยท 27 comments
I have a following trivial situation:
const K: u8 = 20;
...
let position = 2u32;
let global_y_offset_bits = position * K as u32;
...Here it should be trivial for compiler to cast u8 to u32 at compile time and make last line effectively this:
let global_y_offset_bits = position * 20u32;Yet for some reason it doesn't happen and I'm getting this instead:
error: `u8` without `OpCapability Int8`
--> /home/nazar-pc/.rustup/toolchains/nightly-2024-11-22-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:1298:5
|
1298 | / uint_impl! {
1299 | | Self = usize,
1300 | | ActualT = u32,
1301 | | SignedT = isize,
... |
1315 | | bound_condition = " on 32-bit targets",
1316 | | }
| |_____^
|
= note: used by unnamed global (%1504)
note: used from within `ab_proof_of_space_gpu::shader::compute_f1::compute_f1`
--> ../src/shader/compute_f1.rs:85:47
|
85 | let global_y_offset_bits = position * K as u32;
| ^^^^^^^^
I think this is a bug. Even if it was a variable that is assigned a literal I'd expect the compiler to inline it with a correct type. And I believe rustc actually does this for CPU architectures like x86-64.
I imagine inability to handle cases like this will result in suboptimal output in many cases, so it'd be nice to address the root cause of it.
UPD: const K_U32: u32 = K as u32; as a workaround helps, but ideally shouldn't be necessary.
Hmmm, yeah I would expect rustc to get rid of the u8 before we even see it ๐ค
Well, I still hit the same with a slightly more sophisticated example:
const K: u8 = 20;
const PARAM_EXT: u8 = 6;
fn foo(x: u32) -> u32 {
x << K - PARAM_EXT
}And similarly with u32::from():
const K: u8 = 20;
const PARAM_EXT: u8 = 6;
fn foo(x: u32) -> u32 {
x << u32::from(K - PARAM_EXT)
}There are two levels of depth here:
- constant expressions do not seem to be evaluated before being passed further down, so
u32::from(K)works, butu32::from(K - PARAM_EXT)doesn't - for bit shifts specifically, the shift amount can be safely widened to
u32orusizewithout changing the result, so probably should be done automatically (I don't think there is anything else that could be done about it even on platforms that do supportInt8, which would yield better performance)
I have a substantial amount of shifts like that in the code because on one hand that is how protocol specification defines operations (so it is easier to follow the code) and on another I expect it to be a zero-cost abstraction with compiler combining both constants and variables composed exclusively from constants into a single value at compile time.
Here are two examples of functions with expressions that I expect to be fully inlined:
https://github.com/nazar-pc/abundance/blob/c98d6fa9ad92131280063c1c8bda52d107adf51d/crates/shared/ab-proof-of-space/src/chiapos/table.rs#L150-L188
https://github.com/nazar-pc/abundance/blob/c98d6fa9ad92131280063c1c8bda52d107adf51d/crates/shared/ab-proof-of-space/src/chiapos/table.rs#L340-L424
For example this this:
const fn y_size_bits(k: u8) -> usize {
k as usize + PARAM_EXT as usize
}
const fn metadata_size_bits(k: u8, table_number: u8) -> usize {
k as usize
* match table_number {
1 => 1,
2 => 2,
3 | 4 => 4,
5 => 3,
6 => 2,
7 => 0,
_ => unreachable!(),
}
}
let parent_metadata_bits = metadata_size_bits(K, PARENT_TABLE_NUMBER);
let left_metadata_bits =
left_metadata << (u128::BITS as usize - parent_metadata_bits - y_size_bits(K));Is expected to compile to effectively this (K and PARENT_TABLE_NUMBER are generic constants):
let left_metadata_bits = left_metadata << COMPILER_GENERATED_CONSTANT;And it does look like that when compiling to x86-64 in assembly, while SPIR-V codegen seems to translate it from source code to SPIR-V much more literally.
NOTE: Some variables consisting from constants need to be variables, at least for now:
error[E0401]: can't use generic parameters from outer item
--> crates/shared/ab-proof-of-space/src/chiapos/table.rs:352:58
|
340 | pub(super) fn compute_fn<const K: u8, const TABLE_NUMBER: u8, const PARENT_TABLE_NUMBER: u8>(
| - const parameter from outer item
...
352 | const METADATA_SIZE_BITS: usize = metadata_size_bits(K, TABLE_NUMBER);
| ^ use of generic parameter from outer item
|
= note: a `const` is a separate item from the item that contains it
Maybe worth reopening this issue (I can't do that myself).
I also noticed that const functions more generally seem to require u8 as well, but I think I hit too many of issues like mentioned above to make it sustainable for me to move forward.
Yeah, I am looking if the arg is a constant. I assumed other passes / logic would fold constant math into the resulting constant, but looks like not so much? Weird.
Gives you an idea of how much of this is probably offloaded to LLVM in other targets ๐คฏ
Well, at the very least I would expect spirv-opt to handle things like this at the very end
It has:
--ccp
Apply the conditional constant propagation transform. This will
propagate constant values throughout the program, and simplify
expressions and conditional jumps with known predicate
values. Performed on entry point call tree functions and
exported functions.)");
Perhaps we are not using that.
This looks like all we are setting:
I'm now wondering what other goodies are there. From description --ccp alone should solve a bunch of my problems.
Ah, that is on SPIR-V side already. I'd expect for more to be done on Rust side though, things like those mentioned above should ideally never reach SPIR-V in the first place.
My expectation is that compiler will at least recursively compute all constant expressions until some runtime information is needed.
Similar to llvm, we rely on the spirv tools to do things downstream in some places. I am not super familiar with this area of the code base though.
Yeah, I would have expected constants to be handled by rustc before the codegen backend is even called.
constant expressions do not seem to be evaluated before being passed further down, so u32::from(K) works, but u32::from(K - PARAM_EXT) doesn't
We currently only have const folding for add and multiply, not for any other operations:
rust-gpu/crates/rustc_codegen_spirv/src/builder/builder_methods.rs
Lines 1538 to 1560 in 3d5e301
Let me just add some more :D
I've been playing around with const folding within rustc. It seems to emit (x << 14u8) with your expression, with the non-const << implicitly converting the u8 to a u32. But if I explicitly convert to a u32 within a const expr, it'll emit (x << 14u32) and compile without needing Int8.
const K: u8 = 20;
const PARAM_EXT: u8 = 6;
fn foo(x: u32) -> u32 {
// fails due to requiring u8
let result = x << K - PARAM_EXT;
let result = x << (K - PARAM_EXT) as u32;
let result = x << const { K - PARAM_EXT };
// works
let result = x << const { (K - PARAM_EXT) as u32 };
const BLA: u32 = (K - PARAM_EXT) as u32;
let result = x << BLA;
result
}The original (first) expression emits x << (((20 - 6) & 31) as u32). No idea where the & 31 comes from, it's certainly not from rust-gpu so likely rustc added that. (this is without spirv-opt which may inline some of it)
%6 = OpTypeInt 32 0
%13 = OpTypeInt 8 0
%14 = OpConstant %13 20
%15 = OpConstant %13 6
%16 = OpConstant %13 31
%18 = OpLoad %6 %2
%19 = OpISub %13 %14 %15
%20 = OpBitwiseAnd %13 %19 %16
%21 = OpUConvert %6 %20
%22 = OpShiftLeftLogical %6 %18 %21Similar to llvm, we rely on the spirv tools to do things downstream in some places. I am not super familiar with this area of the code base though.
Yeah, I would have expected constants to be handled by rustc before the codegen backend is even called.
That makes a lot of sense, but I think the amount of information arriving at LLVM vs spirv-opt is drastically different. Various assertions, including things like core::hint::assert_unchecked() can help with codegen, but I don't think they are representable in SPIR-V the same way they can be in LLVM IR, meaning a lot of optimizations can only be done before SPIR-V.
I guess this means reimplementing some of these optimizations in rustc, which might also benefit Cranelift, so it doesn't need to be reimplemented in every codegen over and over again.
#317 works for many cases that didn't work before, and tests still pass, nice!
Here is something that it doesn't seem to understand yet (sorry, I didn't attempt to reduce it):
#[inline(always)]
fn compute_fn<const TABLE_NUMBER: u8, const PARENT_TABLE_NUMBER: u8>(
invocation_id: UVec3,
num_workgroups: UVec3,
workgroup_size: u32,
matches: &[Match],
parent_metadatas: &mut [U128],
ys: &mut [u32],
metadatas: &mut [U128],
) {
// TODO: Make a single input bounds check and use unsafe to avoid bounds check later
let invocation_id = invocation_id.x;
let num_workgroups = num_workgroups.x;
let global_size = workgroup_size * num_workgroups;
// TODO: More idiomatic version currently doesn't compile:
// https://github.com/Rust-GPU/rust-gpu/issues/241#issuecomment-3005693043
for index in (invocation_id..matches.len() as u32).step_by(global_size as usize) {
let index = index as usize;
let m = matches[index];
let left_metadata = parent_metadatas[m.left_position as usize];
let right_metadata = parent_metadatas[m.right_position as usize];
let (y, metadata) = compute_fn_impl::<TABLE_NUMBER, PARENT_TABLE_NUMBER>(
m.left_y,
left_metadata,
right_metadata,
);
ys[index] = y;
// Last table doesn't have any metadata
if TABLE_NUMBER < 7 {
metadatas[index] = metadata;
}
}
}Specifically, this part:
if TABLE_NUMBER < 7 {
metadatas[index] = metadata;
}Since TABLE_NUMBER is a constant, it is known at compile time that the branch is either always taken or always not, yet:
error: Missing required capabilities for types
|
= note: `u8` type used without `OpCapability Int8`
What is even more frustrating, it doesn't point to any specific part of the code, so I was only able to find out what it is by commenting out parts of the code until the error went away.
Doesn't need to be tackled in #317, just something I'd expect compiler to be able to figure out too.
Could you try this? In your code you're explicitly requesting the comparison to be done in u8. The explicit cast will make that comparison happen in u32 and the new const folding should inline the cast, so there's no u8 left.
if TABLE_NUMBER as u32 < 7 {
metadatas[index] = metadata;
}A proper solution to this would probably be small integer polyfill, similar to your Large integer polyfills.
Yes, it does work. But as far as I'm concerned it should be computed at compile time and not exist in SPIR-V in the first place. It is either dead code or unconditionally enabled.
I do think automatic widening might be necessary in some cases. For constants it is probably fine, in cases like this too:
var << 2u8In other cases it might have non-negligible performance impact, in which case it might be better for developer to make an explicit choice (to widen an integer, enable Int8 capability or rewrite an algorithm in a different way altogether).
#317 now has const folding support for compare operations (icmp ) with ints and bools, that should fix this case as well.
Yes, that does help too, thanks!
I think what remains in my code now is various const fn not being evaluated at compile time.
For example this const fn takes two u8 constants and returns u32 result (which should in turn be inlined because it is also only used in const expressions down the line), but it still requires u8 capability:
https://github.com/nazar-pc/abundance/blob/0f15b6b6960b274e203396341848a9c6e9a977c7/crates/farmer/ab-proof-of-space-gpu/src/shader/compute_fn.rs#L52
I pushed my code into https://github.com/nazar-pc/abundance/tree/gpu-pos-prototype-compute_fn-wip
This is reproduction:
cd crates/farmer/ab-proof-of-space-gpu
cargo buildCurrently uses rust-gpu revision from #317 (see last commit in the branch). It should compile once this last issue is resolved (I worked around everything else).
With the current system, inlining function calls is not trivial and will likely need a full redesign, or rather a separate const folding system. Currently, we only const folds things as they are emitted. Afaik at codegen time, we have no idea what kind functions do exist around us nor whether they're const or not. We just call it blindly and hope some other codegen unit emits the body of the function so the linker can happily link them together. So const folding function calls would require link-time const folding, where we actually know of all the functions that exist and their contents.
For now, wrapping it in const {} to make rustc do the const eval would probably be the easiest. And feel free to turn your comment into a new issue about const fn evaluation.
Very interesting! I'm very surprised there is a difference between calling a const function with constant arguments and doing the same while wrapping it in const {} block, but it does absolutely make a difference as you have described ๐คฏ This is not what I expected at all.
I'm very surprised there is a difference between calling a const function with constant arguments and doing the same while wrapping it in
const {}block
This is part of standard Rust semantics: const fn merely allows calls to be done at compile-time, but does not force that - and in fact, there are some subtleties (e.g. the callee panicking, which cannot be arbitrarily moved from runtime to compile-time), which mean that care has been taken (within rustc) to ensure runtime calls of const fns do not get moved to compile-time evaluation. (with a few exceptions due to &'static promotion rules being overly lax for a long while - mostly the fault of yours truly, I'm afraid)
I do have constant-folding added to --spirt-passes=reduce in a branch somewhere, but it's sadly entangled with other SPIR-T changes and unlikely to land any time soon.
EDIT: so I drafted this comment days ago, but finally got back to this and you might want to try:
spirv-std = { git = "https://github.com/LykenSol/rust-gpu", branch = "rereduce-after-spirt-up-disaggregate" }RUSTGPU_CODEGEN_ARGS="--no-early-report-zombies --spirt-passes=reduce --dump-spirt-passes=$PWD/spirt-passes"(you might be able to get away without --no-early-report-zombies, but I included it just in case)
It might not even need #317, if I implemented all the same operations you need (I forget, it's been a while).
With the current system, inlining function calls is not trivial and will likely need a full redesign, or rather a separate const folding system.
I would say we would need to move the inliner to SPIR-T, and have a simple heuristic of "how much can the callee shrink based on some subset of arguments being constant". (or even interprocedural const-folding - LLVM has that and it always trips people up who assume #[inline(never)] means the same as "optimizer shouldn't peek at the body" - AFAIK nothing like that exists without separate compilation or some type-checking-only features of proof assistants, e.g. "treat this as an axiom and never unfold its definition")
This is part of standard Rust semantics:
const fnmerely allows calls to be done at compile-time, but does not force that
The reason I'm surprised is that it works for other targets and not with rust-gpu. What you wrote makes a lot of sense though.
EDIT: so I drafted this comment days ago, but finally got back to this and you might want to try:
spirv-std = { git = "https://github.com/LykenSol/rust-gpu", branch = "rereduce-after-spirt-up-disaggregate" }
RUSTGPU_CODEGEN_ARGS="--no-early-report-zombies --spirt-passes=reduce --dump-spirt-passes=$PWD/spirt-passes"
(you might be able to get away without
--no-early-report-zombies, but I included it just in case)It might not even need #317, if I implemented all the same operations you need (I forget, it's been a while).
Tried rereduce-after-spirt-up-disaggregate branch. Not a replacement for #317 and doesn't help with const {} either.
The reason I'm surprised is that it works for other targets and not with rust-gpu.
Only optimizations can cause a runtime call to a function that happens to be const fn, to be constant-folded, across any target - you need to test with something like -C no-prepopulate-passes to disable all LLVM passes, even at optimization level 0 (and possibly even -Z inline-mir=off) to observe a similar effect.
(And even then, LLVM fundamentally has something akin to #317 as part of "attempting to create a runtime instruction", i.e. it will aggressively constant-fold operations between constant values w/o even any passes)
Tried
rereduce-after-spirt-up-disaggregatebranch. Not a replacement for #317 โฆ
Weird, it should at least massively reduce the number of errors, but it could be something unimplemented etc.
(for better or worse, those changes are nowhere near landing, and I'm glad to play around with them to see what could be made to work)
โฆ and doesn't help with
const {}either.
Yeah, that's so open-ended you could put yourself into a position where const {...} compiles but spends a long time evaluating e.g. a very long computation, that no mainstream optimizing compiler would bother trying to unfold at compile-time to get a constant result (but you'd see such a thing also remain unoptimized in LLVM, in that case).
I should say that while I expect we might be able to remove the hack added in #302 (and rely on inlining + const-folding via reduce or something similar), I don't see a reason not to do #317 (at most I would want it to share code with the reduce implementation, but that's it), rustc_codegen_ssa already expects that kind of behavior.