Simplify Draft 'pymd' Box
Opened this issue · 8 comments
Concerning the Image Pyramid Entity Group Box 'pymd' in WG03N1157_23524 HEIF Ed3 Amd2 Prelim WD:
The following four variables seem redundant and could be removed to greatly simplify the 'pymd' box:
- tile_size_x
- tile_size_y
- tiles_in_layer_row_minus1
- tiles_in_layer_column_minus1
pymd | ImageGrid |
---|---|
tiles_in_layer_row_minus1 | rows_minus_one |
tiles_in_layer_column_minus1 | columns_minus_one |
tile_size_x | (output_width / columns_minus_one) |
tile_size_y | (output_height / rows_minus_one) |
If the gridding is handled within the codec, then the codec specific boxes would be used instead. For example, the uncompressed codec would use the uncC box:
pymd | uncC |
---|---|
tiles_in_layer_row_minus1 | num_tile_rows_minus_one |
tiles_in_layer_column_minus1 | num_tile_cols_minus_one |
tile_size_x | (ispe image_width / num_tile_cols_minus_one) |
tile_size_y | (ispe image_height / num_tile_rows_minus_one) |
I imagine these variables were originally placed here for convenience. However, there are already well defined mechanisms for accessing them. In my opinion, duplicating them in the pymd box adds complexity for very little gain.
We could even leave away the layer_binning
entry. It appears to be equivalent to ceil(base width/overview width)
and it is thus redundant. That would eliminate all custom data from the pymd
box, leaving just the list of pyramid image item IDs, sorted according to increasing resolution.
Another reason for leaving away layer_binning
is that it restricts us to power-of-two integer scaling factors only, while I think it would be perfectly fine to also have finer resolution granularity, like a factor
Is there any important reason for the restriction that every pyramid level should have the same tile size? I think this is overly restrictive. There are good reasons why tile sizes would vary for different layers. For example, the base layer might have the original uncompressed data, while overview layers use h265 with a tile size dictated by the encoder hardware). Or building the lower resolutions is much easier if this can be done for each base-level tile separately.
If it is important for an application to have the same tile size in each level, the encoder is free to do so even without enforcing this, and having similar tile sizes could be signaled as a flag (but it's also obvious when looking at the individual pyramid levels).
FYI, the layer_binning isn't restricted to power-of-two scaling. It is restricted to integer factors. For instance, the examples in the document show 2x2 binning as a layer_binning of 2 and 4x4 binning as a layer_binning of 4. One can also choose to implement a 3x3 binning as a layer_binning of 3, etc.
the layer_binning isn't restricted to power-of-two scaling
Right. This was a semantic typo. I mean integer scale factor. I've corrected it above.
I imagine these variables were originally placed here for convenience
Indeed, the purpose was that the reader has all the information it needs in a single place. Depending on which kinds of overviews are composing the pyramid, the information may be more or less complicated to gather (grid, tiled coded image with constrained extents, uncompressed image). For instance, for a grid image the example above is not totally correct due to possible implicit cropping, To be accurate you would have to get the width/height of the input images.
I agree it is worth simplifying in some cases, what about using a flag so that this information would be conditionally present or not?
A conditional flag is one option, but I think that the "difficulty" to get the information is no strong argument. Any decoder that is able to decode the images obviously also knows how the get the image sizes. And from there, all parameters can be trivially computed. Furthermore, it is not so clear why anyone other than the decoder would be interested in the number of tiles in each layer.
I can see two use-cases:
- a multiplexing/demultiplexing application might want to know the size of each layer without being able to decode it. For example, a network server might want to know which resolution layer to stream. This could be solved by making the
ispe
property mandatory when an image is used in apymd
. Theispe
is usually present anyway so that this will not be a big change and double (and thus possibly inconsistent) information is avoided. - it might be useful information whether each resolution layer uses the same tile size. This would be a good case for your conditional flag. If the flag is set, the
pymd
is restricted to use the same tile size in every layer and the tile size is indicated in thepymd
header.
Thus, the specification would be:
ispe
is mandatory for each layer image- the header is defined like this:
aligned(8) class ImagePyramidEntityGroup
extends EntityToGroupBox ('pymd', version = 0, flags) {
if (flags & 1) {
unsigned int(16) tile_size_x;
unsigned int(16) tile_size_y;
}
}
AMD1 of HEIF has been revised. We invite experts to have a new look at the text.