WebAssembly/simd

Consider a more efficient encoding for v128.const 0

ngzhian opened this issue · 10 comments

v128.const 0 is the most common v128.const operation I see in some benchmarks (>10x more frequently than other v128.const constants). It is currently 18 bytes (16 of which are 0), perhaps a more efficient want of encoding this (a special instruction) should be considered (for a subsequent proposal of course)?

Would it be more efficient to just (say) i8x16.splat 0 ?

EDIT: oops, I guess there is no immediate form of the splat instructions.

Yea it would be more efficient (code size) to say:

i32.const 0 (2 bytes)
i8x16.splat (2 bytes)

However, the toolchain will mostly emit a v128.const 0 for that (cc @tlively). I think maybe a special case for splat (of any shape) of constant 0, to emit the splat instead of v128.const, will be nice. That said, binary size isn't a huge problem for now (haven't gotten reports about it yet!) but just filing this to track :)

penzn commented

I think that is the easiest fix :) We can possibly consider using variable integer encoding of up to 128 bits, though that can be slightly awkward.

Related: #255

It would be good to figure out what a typical percentage of code size this would save. If there are situations in which splats (or other patterns) would be faster than v128.const, that would be especially good to know about, because that would be a clear win.

splats would never be faster than v128.const, because v128.const guarantees the literal to be static, while splat does not.

penzn commented

We can use the trusty LEB encoding which we already use in other places.

say we have a 2 byte encoding, in a release-built Wasm file that is
4082107 bytes (with 582 v128.const instructions, 297 of which are with const 0), we can save (18 - 2) * 297 = 4752 bytes (.1%).

The gzip version of the file is 2077721 bytes, I locally replaced the v128.const 0 with 0x7b7b, the resulting gzip version is 2077463 (diff of 258 bytes) (0.01%).

s=open('release.wasm','rb').read()
t=s.replace(b'\xfd\x0c' + b'\x00'*16, b'\x7b\x7b')
open('release-new.wasm','wb').write(t)
$ gzip -9 -c release-new.wasm > release-new.wasm.gz
$ du -b release*
4077355 release-new.wasm
2077463 release-new.wasm.gz
4082107 release.wasm
2077721 release.wasm.gz

So it turns out, not a lot of savings! Streams of 0 compress well, plus v128.const don't show up as much. Sad at this result, but good to have some rough numbers here.

I'm going to close this out, seems not that useful, we can follow-up in #255 anyway, thanks all for the comments!

Thanks for putting in the leg work to investigate that, @ngzhian!