Intrinsic sign_extend / zero_extend / truncate functions

Question

Intrinsic sign_extend / zero_extend / truncate functions

Opened this issue 2 years ago · 7 comments

I'm currently looking through the RISCV reference files and I'm already spotting potential for improvement. The very first instruction I looked at (LUI) does a sign extension by first casting imm to a signed type and then to an unsigned type of the correct width. According to the casting rules, there are actually three casts, as one happens implicitly: (unsigned<XLEN>)(signed<XLEN>)(signed<20>)imm.

It works, but in my opinion it's not very intuitive since a cast doesn't clearly communicate which kind of extension occurs. Luckily I built a very flexible system for intrinsic functions, so we can provide a sign_extend(width, expr) function to make the intent more explicit.

What do you think, is this a direction worth exploring?

Answer 1 · 2022-09-29T22:41:53.000Z

To exand on this further: These intrinsics could have additional validation logic checking that the extended type is actually larger than the source type. If XLEN were smaller than the width of imm, the existing code would just silently fail, while the intrinsic function could instead notify the implementor that it can't sign or zero extend to a smaller type than it started with. The exact opposite goes for a possible truncate function that would raise an error if the truncated length is larger than the input value.

Answer 2 · 2022-09-30T01:21:37.000Z

Alright, quick update: I'm currently rewriting the RV32I ISA to give you a demonstration of what the language would look like with all of my new proposals applied. I must say that these functions are invaluable for the readability of the code. So here's a proper proposal instead of the rough idea above.

5 new functions:

signed<width> sign_extend(unsigned int width, signed<?> value) (only applicable to signed values)
bits<width> zero_extend(unsigned int width, bits<?> value) (if value is signed or unsigned, the return type is as well)
bits<width> truncate(unsigned int width, bits<?> value) (return value is always untyped)
signed<width> truncate_s(unsigned int width, bits<?> value) (identical to (signed)truncate(width, value))
unsigned<width> truncate_u(unsigned int width, bits<?> value) (identical to (unsigned)truncate(width, value))

All of the width parameters must be constant expressions. The extend functions will report an error if the requested width is smaller than that of the input value, the truncate functions if it is larger.

The only annoyance is the constantly repeating XLEN parameter (which is also present with the old casts). As a remedy I would suggest a [[default_truncate_width]] attribute that can be applied to the XLEN parameter declaration. If that is the case, the first parameter to truncate can be omitted and the annotated parameter will be used instead.

Answer 3 · 2022-09-30T06:00:00.000Z

I'm strictly against the introduction of intrinsics for several reasons:

it bloats the language
the C-style casting fulfills the needs
CoreDSL1 had such constructs and we decided to throw them away and stick to the C-style casting to have as litte deviations as possible

Answer 4 · 2022-10-04T15:54:15.000Z

Thoughts:

Making the nature of the casts explicit does add some clarity to the CoreDSL description, but also means extra work for the user.
In the current system, only the source type determines whether a sign- or zero-extension is performed, which composes well with the implicit casts to a larger target type capable of representing all possible values of the source type. Are implicit casts still allowed in your proposal?
Also, wouldn't you have to prohibit extensions of bit vectors, in favor of concatenation with a constant?
I don't like the [[default_truncate_width]] attribute, because it controls the language/intrinsic semantics rather than the ISA.

Answer 5 · 2022-10-04T16:00:00.000Z

PS: The C-style cast syntax is not great. Personally, I'm annoyed that we often need multiple layers of parentheses.

((T<n>) (expr))

Maybe we should've gone for

T<n>(expr)

instead.

Answer 6 · 2022-10-04T16:15:08.000Z

Are implicit casts still allowed in your proposal?

Yes, none of the existing semantics would be changed, there would just be additional functions to make the operations more explicit and provide additional static validation. These would be entirely optional, so as an implementor you could choose to simply not use them. Ideally, these constructs would be treated as syntactic sugar and eliminated in the fontend, so backends would just see a regular cast (or rather, an even more generic bit swizzle).

Answer 7 · 2022-10-04T16:23:00.000Z

Also, wouldn't you have to prohibit extensions of bit vectors, in favor of concatenation with a constant?

Zero extension is a type-agnostic operation, so as described in the proposal, it can work on bit vectors. Sign extension on the other hand only makes sense on signed values and hence treats its operand as such. Whether we force the users to make this explicit by casting the operand to (signed), or whether we accept any bit vector and cast it internally is something we'd still need to decide on. The proposal as it stands only accepts signed values for sign_extend.