missing rotl/rotr in wasm
Closed this issue · 9 comments
Using the C code from https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant , I'll see no rotl/rotr instructions after decompiling wasm with wasm2wat ;
but when using assemblyscript
// ...
x0 += x4; x12 ^= x0; x12 = rotl<u32>(x12, 16);
// ...it does generate rotl in wasm & is 10 times faster than emcc -O3 .
I guess maybe the llvm backend is not generating these instructions. Can you share the C code where you expect them to be generated?
#include <stdio.h>
#define uint32_t unsigned
// https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant
#define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
#define QR(a, b, c, d) ( \
a += b, d ^= a, d = ROTL(d, 16), \
c += d, b ^= c, b = ROTL(b, 12), \
a += b, d ^= a, d = ROTL(d, 8), \
c += d, b ^= c, b = ROTL(b, 7))
#define ROUNDS 20
uint32_t chacha_block(uint32_t out[16], uint32_t const in[16])
{
int i;
uint32_t x[16];
for (i = 0; i < 16; ++i)
x[i] = in[i];
// 10 loops × 2 rounds/loop = 20 rounds
for (i = 0; i < ROUNDS; i += 2) {
// Odd round
QR(x[0], x[4], x[ 8], x[12]); // column 1
QR(x[1], x[5], x[ 9], x[13]); // column 2
QR(x[2], x[6], x[10], x[14]); // column 3
QR(x[3], x[7], x[11], x[15]); // column 4
// Even round
QR(x[0], x[5], x[10], x[15]); // diagonal 1 (main diagonal)
QR(x[1], x[6], x[11], x[12]); // diagonal 2
QR(x[2], x[7], x[ 8], x[13]); // diagonal 3
QR(x[3], x[4], x[ 9], x[14]); // diagonal 4
}
for (i = 0; i < 16; ++i)
out[i] = x[i] + in[i];
return out[0];
}
uint32_t in[16]={0x61707865, 0x3320646e, 0x79622d32, 0x6b206574,
0xfb2e460e, 0xb00a2090, 0x700a1db2, 0x50c0c43f,
0x713d2225, 0x43334aac, 0x54aff28d, 0x8f1feb3d,
0, 0, 0, 0};
uint32_t out[16];
int main(int argc, char **argv) {
while(chacha_block(out, in) > 0x100) in[12]++;
for(int i=0; i<16; i++) {
printf("%2d: %08x -> %08x\n", i, in[i], out[i]);
}
printf("in[12] = %d\n", in[12]);
return 0;
}please note the #define ROTL ...
It looks like we do have at least some lowering to rotl and rotr in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/rotate-i3264.ll
And more in llvm/test/CodeGen/WebAssembly/i32.ll.
As to why the particular sequence in #define ROTL is not getting lowers this far I don't know. Do any other llvm targets lower that macro to single rotl/rotr instructions?
This might be something we can look into, whether this should pattern match. To work around this, can you see if the __builtin_rotateleft clang builtin works for you?
Helps!
> wasm2wat chacha20.wasm|egrep 'rot[lr]'|uniq -c
32 i32.rotl
Thanks!
This could still be a feature request for LLVM to pattern match this. Do you mind filing an issue against the wasm backend in https://github.com/llvm/llvm-project ? Do you happen to know if this works on other platforms such as ARM or x86? There might very well be a generic pass somewhere that will match this code into an fshl intrinsic (which we would then lower to the wasm rotate instruction).