etemesi254/zune-image

Expand BmpDecoder::decode_into to support writing BGRA32 with a provided stride

Opened this issue · 28 comments

Hi!

I have a feature request here for the bmp decoder:
Currently (from looking at the code), it looks like the decoder fills a u8 buffer with no padding, and some variant of RGB or RGBA depending on the format. It's not really clear the exact byte format used and it can vary based on the input.

In my framework I use BGRA32 and a stride with row padding (rounding the rows up so that SIMD operations don't slow down from misalignment). Would it be possible to add a decode_into_format(&mut [u8], PixelLayout::BGRA32, stride: u32) method or something similar? Or let me know if there is an API I should be using instead.

Thanks!

Curiously, I'm seeing it decode into either RGBA32 or BGR24

Hi, yes, we can only know the output format based on type, bmp doesn't tell us the format output, no one really knows the format output, we basically just try to agree to what is the current accepted (what web browsers do). There is https://blog.mozilla.org/nnethercote/2015/11/06/i-rewrote-firefoxs-bmp-decoder/ where someone talks about Firefox bmp decoder.

In my framework I use BGRA32 and a stride with row padding (rounding the rows up so that SIMD operations don't slow down from misalignment). Would it be possible to add a decode_into_format(&mut [u8], PixelLayout::BGRA32, stride: u32) method or something similar? Or let me know if there is an API I should be using instead.

This isn't implemented in any place, both in a decoder and in zune-image as a whole and I am not sure implementing it will bring any sort of performance improvements anywhere. I haven't seen slowdowns when loading and writing unaligned storage with SIMD on latest CPUs. And it's quite a headache to get it right but will complicate the api.

hi, the behaviour of the decoders is to preserve as much information as possible, so that means if the image is rgb we return rgb, if rgba we return rgba

all of the decoders do that, via the method (get)_colorspace

it's usually enough to decode the image headers to know which output format it will be

let mut decoder = BmpDecoder::new(Cursor::new([]));

decoder.decode_headers().unwrap()

let mut colorspace= decoder.colorspace();

if colorspace == Colorspace::RGB{
  // format is rgb24/bgr24
} 
if colorspace == Colorspace::RGBA{
  // format is rgba32
}
// do decoding now
decoder.decode();

The code above is pseudocode and probably doesn't run(I'm typing this on mobile) but it shows the general way to know what type

Initially in my naive ways I messed pixel format and Colorspace and I'm not sure it's worth changing the whole thing (colorspace here may be what libraries like ffmpeg call pixel format)

And I'm using
rgb24.bmp and rgba32-1.bmp from your test images and seeing bgr24

I may not get you here, when you say bgr24 do you mean you are packing them into a larger type, like u32?

Would it be possible if you could share the code you are using? Because I'm not sure the library can do that.

Running ./zune -i ./rgb24.bmp --view --trace using https://github.com/etemesi254/zune-image/releases/tag/v0.5.0-rc0 binary prints

INFO  [zune_bin::cmd_parsers::global_options] Initialized logger
INFO  [zune_bin::cmd_parsers::global_options] Log level :TRACE
INFO  [zune_bin::workflow] Creating workflows from input

TRACE [zune_image::pipelines] Current state: Decode

TRACE [zune_bmp::decoder] Width: 127
TRACE [zune_bmp::decoder] Height: 64
TRACE [zune_bmp::decoder] Pixel format : RGB
TRACE [zune_bmp::decoder] Compression  : RGB
TRACE [zune_bmp::decoder] Bit depth: 24
TRACE [zune_image::pipelines] Finished decoding in 0 ms
TRACE [zune_image::pipelines] Finished operations for this workflow
TRACE [zune_bin::show_gui] Wrote 24540 bytes

(On Linux). The pixel format is rgb, and depth 24, meaning it's rgb24, if it were bgr it would indicate pixel format being bgr as we have support for BGR images

Pushed 0.5.0-rc0. Could you test if this is the same behaviour? And if possible could you share the code you are using?

I migrated to the new traits etc, but the issue remains. I've simplified it here:

use zune_bmp::zune_core::bytestream::ZCursor;

#[test]
pub fn test_rgba32_bmp_channel_order(){
    let bytes32 = imageflow_http_helpers::fetch_bytes("https://raw.githubusercontent.com/etemesi254/zune-image/dev/test-images/bmp/rgba32-1.bmp").unwrap();
    let bytes24 = imageflow_http_helpers::fetch_bytes("https://raw.githubusercontent.com/etemesi254/zune-image/dev/test-images/bmp/rgb24.bmp").unwrap();

    let mut decoder32 = zune_bmp::BmpDecoder::new(ZCursor::new(&bytes32));
    let decoded_bytes = decoder32.decode().unwrap();
    // We know the first pixel is more red than blue, so RGB order means byte 0 is greater than byte 2
    assert!(decoded_bytes[0] > decoded_bytes[2]);
    // The above passes

    let mut decoder24 = zune_bmp::BmpDecoder::new(ZCursor::new(&bytes24));
    let decoded_bytes = decoder24.decode().unwrap();
    // We know the first pixel is more red than blue, so RGB order means byte 0 is greater than byte 2
    assert!(decoded_bytes[0] > decoded_bytes[2]);
    // But this one fails.
}

Aah, now I see, sorry for the long back and forth. It's an optimization gone wrong.

I will have a fix in ~24 hours. Thank you for your patience

5631f52 fixes for 24 bit and 32 bit. BMP has a lot of weird quirks I'm learning.

Thanks for making it consistent, I was concerned about future breakage. However, since I'm adding this format primarily as a high-performance buffer format for interoperability with other software that also generates images, I'm concerned that swapping bytes in every pixel only to immediately swap them back will add too much overhead. Like most windows-interacting image software, rendering is done in BGRA (just like bitmaps), so it's quite redundant to have to do per-pixel work.

Actually, it's the opposite. Nearly all computing platforms use little-endian nowadays, which means that the byte order is swapped in memory compared to your hexadecimal on screen (since a pixel has historically been mapped to a 32-bit integer, even if nowadays we have alternatives).

Thus, BGRA is the native pixel layout in terms of memory, even if the corresponding hex representation for int32 would be 0xAARRGGBB.

https://chat.openai.com/share/1b58701c-4dd2-4896-911d-24e5f2cc9005

RGBA and ARGB are also commonly used on non-windows platforms. Thus, most image codecs will support decoding to all 3 (or more, for HDR).

I am aware of BGRA, but that happens when you either reinterpret the buffer of pixels that is in &[u8] to &[u32], you will get BGRA as your pixel type. At this point, it becomes implementation knowledge to know what you are doing.

When viewing them as individual bytes in a &[u8] , the red pixel comes first, followed by green and followed by blue, since the library tries to manipulate pixels as bytes, to it, red comes first, then green, then blue and finally alpha if we are doing rgba.

zune-image, for speed reasons de-interleaves the whole pixel array to allow for parallel processing of different channels, this means in it's memory, if the colorspace is rgb, the first channel contains R, second G, and third B.

Well, in windows memory the individual bytes are ordered B,G,R,A. Which is why they have the same layout in the windows bitmap format. If possible, it would be nice to have a flag that preserves that order.

I can add a flag to do that, will do it over the weekend.

First an update, all supported images decode to rgb now.

Second would a separate pass on rgba to bgra satisfy the need to ensure bgra output, It can be done via simd. I don't want to add flags because bmps have different configs, e.g palette images, 16 bit images (packed format), 32 bit images with different bitconfigurations, (see https://entropymine.com/jason/bmpsuite/bmpsuite/html/bmpsuite.html), so ensuring each can either do rgb(a) or bgr(a) is a lot of work and a maintenance burden, but having it as a separate entity means that it's just a different pass. And BMP isn't used for huge images so I don't think it would be a bottleneck

Added initial support for this, using const generics so unnecessary paths should be optimized out (not yet confirmed), 0.5.0-rc3 should have this.

I put it in a feature flag to reduce code bloat.

In Cargo.toml

zune-bmp={version="0.5.0-rc3", features=["rgb_inverse"]}

In code, call

let mut decoder = BmpDecoder::new();
decoder.preserve_bgra(true);
// other things

Hi, any more concerns?

The redesign in the IO interface requires some complex mapping to my own IO traits, and I haven't gotten that solved yet.