missing rotl/rotr in wasm

Question

missing rotl/rotr in wasm

Closed this issue 2 months ago · 9 comments

Using the C code from https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant , I'll see no rotl/rotr instructions after decompiling wasm with wasm2wat ;

but when using assemblyscript

// ...
x0 += x4; x12 ^= x0; x12 = rotl<u32>(x12, 16);
// ...

it does generate rotl in wasm & is 10 times faster than emcc -O3 .

Answer 1 · 2025-08-26T01:37:11.000Z

I guess maybe the llvm backend is not generating these instructions. Can you share the C code where you expect them to be generated?

Answer 2 · 2025-08-27T18:51:32.000Z

#include <stdio.h>
#define uint32_t unsigned

// https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant

#define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
#define QR(a, b, c, d) (             \
	a += b, d ^= a, d = ROTL(d, 16), \
	c += d, b ^= c, b = ROTL(b, 12), \
	a += b, d ^= a, d = ROTL(d,  8), \
	c += d, b ^= c, b = ROTL(b,  7))
#define ROUNDS 20

uint32_t chacha_block(uint32_t out[16], uint32_t const in[16])
{
	int i;
	uint32_t x[16];

	for (i = 0; i < 16; ++i)
		x[i] = in[i];
	// 10 loops × 2 rounds/loop = 20 rounds
	for (i = 0; i < ROUNDS; i += 2) {
		// Odd round
		QR(x[0], x[4], x[ 8], x[12]); // column 1
		QR(x[1], x[5], x[ 9], x[13]); // column 2
		QR(x[2], x[6], x[10], x[14]); // column 3
		QR(x[3], x[7], x[11], x[15]); // column 4
		// Even round
		QR(x[0], x[5], x[10], x[15]); // diagonal 1 (main diagonal)
		QR(x[1], x[6], x[11], x[12]); // diagonal 2
		QR(x[2], x[7], x[ 8], x[13]); // diagonal 3
		QR(x[3], x[4], x[ 9], x[14]); // diagonal 4
	}
	for (i = 0; i < 16; ++i)
		out[i] = x[i] + in[i];

	return out[0];
}

uint32_t in[16]={0x61707865, 0x3320646e, 0x79622d32, 0x6b206574,
	0xfb2e460e, 0xb00a2090, 0x700a1db2, 0x50c0c43f,
	0x713d2225, 0x43334aac, 0x54aff28d, 0x8f1feb3d,
	0, 0, 0, 0};
uint32_t out[16];
int main(int argc, char **argv) {

	while(chacha_block(out, in) > 0x100) in[12]++;

	for(int i=0; i<16; i++) {
		printf("%2d: %08x -> %08x\n", i, in[i], out[i]);
	}
	printf("in[12] = %d\n", in[12]);
	return 0;
}

Answer 3 · 2025-08-27T18:52:07.000Z

please note the #define ROTL ...

Answer 4 · 2025-08-27T22:26:05.000Z

I see, so you are expecting the compiler (clang/llvm) to lower the #define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b)))) sequence into wasm rotl/rotr instructions but it isn't?

@dschuff @tlively do you know if/why LLVM doesn't do this?

Answer 5 · 2025-08-27T23:08:17.000Z

It looks like we do have at least some lowering to rotl and rotr in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/rotate-i3264.ll

And more in llvm/test/CodeGen/WebAssembly/i32.ll.

As to why the particular sequence in #define ROTL is not getting lowers this far I don't know. Do any other llvm targets lower that macro to single rotl/rotr instructions?

Answer 6 · 2025-08-27T23:21:01.000Z

This might be something we can look into, whether this should pattern match. To work around this, can you see if the __builtin_rotateleft clang builtin works for you?

Answer 7 · 2025-08-29T08:41:10.000Z

Helps!

> wasm2wat chacha20.wasm|egrep 'rot[lr]'|uniq -c
     32         i32.rotl

Thanks!

Answer 8 · 2025-08-29T17:17:33.000Z

This could still be a feature request for LLVM to pattern match this. Do you mind filing an issue against the wasm backend in https://github.com/llvm/llvm-project ? Do you happen to know if this works on other platforms such as ARM or x86? There might very well be a generic pass somewhere that will match this code into an fshl intrinsic (which we would then lower to the wasm rotate instruction).

Answer 9 · 2025-09-01T12:33:12.000Z

llvm/llvm-project#156322