phoboslab/qoa

What's the justification for using big-endian?

chocolate42 opened this issue · 3 comments

Little-endian seems to make more sense IMO. Every common architecture uses little endian (x86/arm/risc-v) meaning using big-endian forces the vast majority of hardware in use to do a bswap. Not an expensive operation but more expensive than not having to do anything at all.

See the addendum, right at the end: https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression

tl;dr: imagine how this would be stored on disk if it was LE:

.- QOA_SLICE -- 64 bits, 20 samples --------------------------/  /------------.
|        Byte[0]         |        Byte[1]         |  Byte[2]  \  \  Byte[7]   |
| 7  6  5  4  3  2  1  0 | 7  6  5  4  3  2  1  0 | 7  6  5   /  /    2  1  0 |
|------------+--------+--------+--------+---------+---------+-\  \--+---------|
|  sf_index  |  r00   |   r01  |   r02  |  r03    |   r04   | /  /  |   r19   |
`-------------------------------------------------------------\  \------------`

I just did a quick benchmark, decoding 9807 seconds of QOA (the Bladerunner 2049 audio track):
LE: 3.448s
BE: 3.534s

So yes, there's an impact, but it's marginal and in this case easily overruled by my sense of aesthetics :]

So yes, there's an impact, but it's marginal

That is not marginal, the LE version takes ~2.4% less effort for the same result. Multiplied by however many billion times the format may be encoded/decoded if it takes off, that's a lot of wasted cycles. Every 41 decodes you're in effect doing an extra decode for no reason.

QOI got a lot criticism for it's endiannes.

That was a different kettle of fish. QOI's values are mostly bit-aligned where endianness differences are really getting into the weeds. QOA has been designed to be packed into aligned 64 bit words, presumably to match common hardware whose native words are 64 bit. Seems silly not to take that to its logical conclusion and use the common native endianness.

So, as a preventive measure – before you complain please ask yourself: does it really matter?

Only in the sense that LE is the objectively better choice for the vast majority of hardware out there that might use this format. In the context of the format pretty important I'd argue. In the context of world events not so much.

Storing slice bytes backwards just feels wrong. The layout on disk would make no sense. Explaining the backwards slice format would complicate a format that I worked very hard to simplify.

So for the sake of a single line note in the spec stating that everything except the file_header is stored LE for performance reasons, you would choose worse performance. For the sake of aesthetics that aren't even visible to users of the format, only to devs implementing the format. Devs that most-likely would prefer more performance over minor aesthetics. Sorry to bang on about it, hopefully the effort I'm going to to try and convince you at least demonstrates that it's worth reconsidering.

How about an aesthetic argument from me. Say a decoder doesn't read a file directly but instead mmaps it, letting the OS handle the reading details.

  //file_header already read with read();
  uint64_t *mapping;
  mapping=mmap(NULL, file.len-8, PROT_READ, MAP_SHARED, file.fd, 8);

An LE version of the format can use the mapping directly, whereas a BE version has to use a local variable current_val=be64toh(mapping++); before it can process, every time it advances.

at least demonstrates that it's worth reconsidering.

It certainly is and I spend a bunch of time pondering this question. I just arrived at a different conclusion.

With the intended use-case – for self contained applications/games, not as interchange format – and the "hackability" of QOA, a developer could easily change it to LE if these 2.4% savings are necessary.