Change wire format to little-endian
nigeltao opened this issue ยท 4 comments
Spun out of #28 (comment) where I said:
I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads.
I just uploaded a proof of concept which showed an approx 1.05x improvement in decode speed. Copy/pasting the commit message:
Change wire format to little-endian
This is a backwards incompatible change to the wire format. For example,
the QOI_DIFF_24 bit-packing encoding changes from:
| MSB 6 5 4 3 2 1 LSB |
+---------------------------------+
| 1 1 1 0 r4 r3 r2 r1 | addr+0
| r0 g4 g3 g2 g1 g0 b4 b3 | addr+1
| b2 b1 b0 a4 a3 a2 a1 a0 | addr+2
to:
| MSB 6 5 4 3 2 1 LSB |
+---------------------------------+
| r3 r2 r1 r0 0 1 1 1 | addr+0
| b1 b0 g4 g3 g2 g1 g0 r4 | addr+1
| a4 a3 a2 a1 a0 b4 b3 b2 | addr+2
----
Decode speed-up on an Intel NUC (Comet Lake, BXNUC10i5FNKPA):
1.07x images/kodak
1.04x images/misc
1.01x images/screenshots
1.06x images/textures
1.05x images/wallpaper
In detail:
images/kodak
decode ms encode ms decode mpps encode mpps size kb
libpng: 8.0 144.2 49.02 2.73 717
stbi: 8.8 84.2 44.43 4.67 979
qoi: 3.6 4.2 109.80 93.41 771
qoile: 3.3 4.2 117.70 93.41 771
images/misc
decode ms encode ms decode mpps encode mpps size kb
libpng: 9.3 89.1 82.18 8.57 335
stbi: 8.2 77.5 93.17 9.85 497
qoi: 2.8 3.1 275.03 245.73 451
qoile: 2.7 3.1 285.92 245.95 451
images/screenshots
decode ms encode ms decode mpps encode mpps size kb
libpng: 45.3 519.5 181.85 15.84 2219
stbi: 35.2 622.5 233.73 13.22 2821
qoi: 24.3 23.9 339.08 344.26 2582
qoile: 23.9 23.9 343.80 344.20 2582
images/textures
decode ms encode ms decode mpps encode mpps size kb
libpng: 2.6 33.1 50.84 3.92 163
stbi: 2.5 19.8 52.26 6.56 232
qoi: 0.9 1.0 149.74 130.18 184
qoile: 0.8 1.0 158.48 130.22 184
images/wallpaper
decode ms encode ms decode mpps encode mpps size kb
libpng: 154.4 2289.3 60.71 4.09 9224
stbi: 190.0 1455.1 49.32 6.44 13299
qoi: 71.2 76.6 131.57 122.35 10647
qoile: 67.9 76.6 138.11 122.37 10647
The key difference between qoi.h
and qoile.h
is:
391c183
< void *qoi_decode(const void *data, int size, int *out_w, int *out_h, int channels) {
---
> void *qoile_decode(const void *data, int size, int *out_w, int *out_h, int channels) {
427c219
< int b1 = bytes[p++];
---
> uint32_t b = peek_u32le(bytes + p);
429,462c221,257
< if ((b1 & QOIBE_MASK_2) == QOIBE_INDEX) {
< px = index[b1 ^ QOIBE_INDEX];
< }
< else if ((b1 & QOIBE_MASK_3) == QOIBE_RUN_8) {
< run = (b1 & 0x1f);
< }
< else if ((b1 & QOIBE_MASK_3) == QOIBE_RUN_16) {
< int b2 = bytes[p++];
< run = (((b1 & 0x1f) << 8) | (b2)) + 32;
< }
< else if ((b1 & QOIBE_MASK_2) == QOIBE_DIFF_8) {
< px.rgba.r += ((b1 >> 4) & 0x03) - 1;
< px.rgba.g += ((b1 >> 2) & 0x03) - 1;
< px.rgba.b += ( b1 & 0x03) - 1;
< }
< else if ((b1 & QOIBE_MASK_3) == QOIBE_DIFF_16) {
< int b2 = bytes[p++];
< px.rgba.r += (b1 & 0x1f) - 15;
< px.rgba.g += (b2 >> 4) - 7;
< px.rgba.b += (b2 & 0x0f) - 7;
< }
< else if ((b1 & QOIBE_MASK_4) == QOIBE_DIFF_24) {
< int b2 = bytes[p++];
< int b3 = bytes[p++];
< px.rgba.r += (((b1 & 0x0f) << 1) | (b2 >> 7)) - 15;
< px.rgba.g += ((b2 & 0x7c) >> 2) - 15;
< px.rgba.b += (((b2 & 0x03) << 3) | ((b3 & 0xe0) >> 5)) - 15;
< px.rgba.a += (b3 & 0x1f) - 15;
< }
< else if ((b1 & QOIBE_MASK_4) == QOIBE_COLOR) {
< if (b1 & 8) { px.rgba.r = bytes[p++]; }
< if (b1 & 4) { px.rgba.g = bytes[p++]; }
< if (b1 & 2) { px.rgba.b = bytes[p++]; }
< if (b1 & 1) { px.rgba.a = bytes[p++]; }
---
> if ((b & QOILE_MASK_2) == QOILE_INDEX) {
> px = index[(b >> 2) & 63];
> p += 1;
> }
> else if ((b & QOILE_MASK_3) == QOILE_RUN_8) {
> run = ((b >> 3) & 0x1F);
> p += 1;
> }
> else if ((b & QOILE_MASK_3) == QOILE_RUN_16) {
> run = ((b >> 3) & 0x1FFF);
> p += 2;
> }
> else if ((b & QOILE_MASK_2) == QOILE_DIFF_8) {
> px.rgba.r += ((b >> 2) & 0x03) - 1;
> px.rgba.g += ((b >> 4) & 0x03) - 1;
> px.rgba.b += ((b >> 6) & 0x03) - 1;
> p += 1;
> }
> else if ((b & QOILE_MASK_3) == QOILE_DIFF_16) {
> px.rgba.r += ((b >> 3) & 0x1F) - 15;
> px.rgba.g += ((b >> 8) & 0x0F) - 7;
> px.rgba.b += ((b >> 12) & 0x0F) - 7;
> p += 2;
> }
> else if ((b & QOILE_MASK_4) == QOILE_DIFF_24) {
> px.rgba.r += ((b >> 4) & 0x1F) - 15;
> px.rgba.g += ((b >> 9) & 0x1F) - 15;
> px.rgba.b += ((b >> 14) & 0x1F) - 15;
> px.rgba.a += ((b >> 19) & 0x1F) - 15;
> p += 3;
> }
> else if ((b & QOILE_MASK_4) == QOILE_COLOR) {
> p += 1;
> if (b & 0x10) { px.rgba.r = bytes[p++]; }
> if (b & 0x20) { px.rgba.g = bytes[p++]; }
> if (b & 0x40) { px.rgba.b = bytes[p++]; }
> if (b & 0x80) { px.rgba.a = bytes[p++]; }
If the highest bits of a byte decide what should be done the checks can be simplified.
My (Seed7) code works with the following definitions:
const integer: QOI_INDEX is 16#3f; # 00xxxxxx
const integer: QOI_RUN_8 is 16#5f; # 010xxxxx
const integer: QOI_RUN_16 is 16#7f; # 011xxxxx
const integer: QOI_DIFF_8 is 16#bf; # 10xxxxxx
const integer: QOI_DIFF_16 is 16#df; # 110xxxxx
const integer: QOI_DIFF_24 is 16#ef; # 1110xxxx
const integer: QOI_COLOR is 16#ff; # 1111xxxx
This allows checks without masks:
if byte1 <= QOI_INDEX then
...
elsif byte1 <= QOI_RUN_8 then
...
elsif byte1 <= QOI_RUN_16 then
...
elsif byte1 <= QOI_DIFF_8 then
...
elsif byte1 <= QOI_DIFF_16 then
...
elsif byte1 <= QOI_DIFF_24 then
...
else # byte1 <= QOI_COLOR
...
end if;
This trick works also in C. I have not measured how much time is saved with this in C. If the lowest bits are used to do the decisions this type of optimization is not possible.
The file format for QOI will not change anymore. See #37 for more info.
Ideas for a successor to QOI should be discussed here: https://github.com/nigeltao/qoi2-bikeshed/issues
Please reconsider. This is a huge wart for the spec, if you ask me. There's little reason to have big-endian ints in 2021.
As far as I can tell it's only the header fields that are big endian. The rest of this diff is about the bit encoding inside the bytes which isn't really endianness related (and I think @ThomasMertes is right, it makes sense to keep it as-is).
I totally agree though it's a super weird decision to make the headers big endian and should probably be changed.