cogeotiff/rio-cogeo

invalid COG from GDAL 3.1.0 onwards

ungarj opened this issue ยท 12 comments

Not sure whether this is the right place to report this but it seems relevant as it can be reproduced using rio cogeo create and rio cogeo validate.

Use the files in validate.zip to reproduce:

rio cogeo create temp.tif cog.tif && rio cogeo validate cog.tif with GDAL 3.0.4 creates a COG and validation outputs this:

Reading input: /temp.tif
Adding overviews...
Updating dataset tags...
Writing output to: /cog.tif
/cog.tif is a valid cloud optimized GeoTIFF

whereas running the same command with GDAL 3.1.0 (also tested 3.1.1 and 3.1.2 with the same result) makes rio cogeo validate return an error:

Reading input: /temp.tif
Adding overviews...
Updating dataset tags...
Writing output to: /cog.tif
The following errors were found:
- The offset of the main IFD should be 8 for ClassicTIFF or 16 for BigTIFF. It is 192 instead
/cog.tif is NOT a valid cloud optimized GeoTIFF

Again, I guess the error rather has its origin in either GDAL itself or rasterio, however I think it is worth at least documenting it here as it affects COG creation using rio-cogeo.

I'll investigate further on the GDAL and rasterio issue trackers but do you have a clue where the issue could be?

๐Ÿ‘‹ @ungarj
I'm getting the same results when using GDAL command

$ gdaladdo tempr.tif -minsize 64
$ gdal_translate temp.tif temp_cog.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

$ tiffdump temp_cog.tif 
Magic: 0x4949 <little-endian> Version: 0x2a <Cl
Directory 0: offset 192 (0xc0) next 718 (0x2ce)
...

The validation script seems ok, so I don't really get why it's different.

Sadly I believe this is a GDAL problem!

Wait ๐Ÿ‘€ https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py#L183-L213

This might just mean that you can't use the validation script with COG created with gdal3.1 ๐Ÿ˜ญ

๐Ÿ˜ญ ๐Ÿ˜ญ ๐Ÿ˜ญ

$ python3 validate_cloud_optimized_geotiff.py temp_cog.tif
temp_cog.tif is a valid cloud optimized GeoTIFF

The size of all IFD headers is 920 bytes

---> We can't validate COG created with gdal3.1

@ungarj, as pointed by @geospatial-jeff on a private slack, this is because of the Ghost Area https://gdal.org/drivers/raster/cog.html#header-ghost-area

Looking at GDAL docs, I wasn't aware that ghost area would be present for GeoTIFF that were created without the new COG driver... this is either a GDAL/Libtiff bug or something expected but that should also be documented in the https://gdal.org/drivers/raster/gtiff.html page

cc @rouault

this is either a GDAL/Libtiff bug or something expected but that should also be documented in the https://gdal.org/drivers/raster/gtiff.html page

The COG driver is mostly a wrapper over the GTiff one. So if using the COPY_SRC_OVERVIEWS=YES creation option, the ghost area will be created.
I'm not sure what "rio cogeo validate" runs, but it should be updated with the logic of latest validate_cloud_optimized_geotiff.py

Thanks @rouault this is really helpful!

I'm not sure what "rio cogeo validate" runs, but it should be updated with the logic of latest validate_cloud_optimized_geotiff.py

rio cogeo validate is not updated with the latest change you added to the gdal script because sadly we (I mean in rasterio) don't have access to the bytes directly so we can't do https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py#L196-L208 ๐Ÿ˜ญ

I'll see if I can PR rasterio to add such method.

yon't have access to the bytes directly

you can probably relax the test on the location of the first IFD to be within the first 200 bytes or so

Hi and thanks @vincentsarago and @rouault for the quick responses!

When I look at https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py#L196-L208 I see a problem that should be fixed in GDAL. That Python script is using GDAL's VSI API to work around a defect in the GDAL metadata API, no?

That Python script is using GDAL's VSI API to work around a defect in the GDAL metadata API, no?

well, this is really low level information, that is of little use except for the validation script, hence I didn't feel like adding it into metadata.

I would tend to agree with @sgillies that it will be nice to have this feature (accessing the hidden metadata) directly in GDAL. Not sure what's the easiest way forward.

Would be nice to just be able to do GetMetadataItem("GDAL_GHOST_AREA_METADATA") ๐Ÿคท

Would be nice to just be able to do GetMetadataItem("GDAL_GHOST_AREA_METADATA")

See OSGeo/gdal#2832