llvm/llvm-project

Feature request: pattern match for rotl / rotr

Closed this issue · 2 comments

It would be nice if

#define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b))))

could be translated to

__builtin_rotateleft(...)

emscripten (emcc, →wasm2wat | grep -i rot) did not do this automatically using Ubuntu 24.04 x86_64 neither Raspberry Pi OS (Debian bookworm arm64).

To make this safe for a rotate count of zero, where (a) >> (32 - (0)) becomes a >> 32 with shift-count UB, you can use

static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);  // assumes width is a power of 2.

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n<<c) | (n>>( (-c)&mask ));
}

see https://stackoverflow.com/questions/776508/best-practices-for-circular-shift-rotate-operations-in-c - GCC and Clang for years have optimized this to a rotate instruction at least when targeting real ISAs. I haven't tested with WASM.

If shifts in emscripten are well-defined for counts wider than the type-width, or if you're using 64-bit integers for this 32-bit shift, then the optimization you're hoping for wouldn't be correct for your code: ROTL(a, 42) should shift all the bits out in the a<<42 part, and do something else in the a>>(-10) part.
Masking the shift count would avoid this problem and make it a 32-bit rotate.

(Unless the type you're rotating with this macro is larger; then you'd be shifting in some of the high bits instead of zeros. An inline function doesn't have this problem.)

I can confirm that wasm clang also does this optimization with the example @pcordes gave. So I think we can close this.