georust/geotiff

Reviving the geotiff crate on top of image-tiff

weiji14 opened this issue ยท 14 comments

Hi there ๐Ÿ‘‹

I've been tinkering around with reading GeoTIFFs in Rust using the the tiff crate at https://github.com/image-rs/image-tiff, and see that this is possibly the recommended way going forward as mentioned at #5 (comment), #1 (comment), as well as new crates like @pka's georaster, and my experimental cog3pio repo (not published yet).

Personally, I'd like to help take over the maintenance of this repo if possible, to upstream all the pure Rust parts of GeoTIFF decoding, so that I can focus on just the PyO3 bindings (and someone mentioned at https://discord.com/channels/598002550221963289/963346313708195880/1213207827309396050 that they'd like to make R bindings). It seems worthwhile to have a GDAL-less GeoTIFF reader, and I'd be keen to get the ball rolling if possible.

If this sounds ok, I'll start to open up a few issues on what needs to be refactored/modernized, update the documentation, and slowly chip away at some maintenance tasks.

Ok, started some general maintenance tasks like setting up CI (#8), bumping to Rust edition 2021 (#9) and fixing some compilation warnings (#10).

@dominikbucher or @frewsxcv, could you review some of those PRs (especially #8) if you have time? Or I could ask someone on the GeoRust Discord about getting write permissions to the repo.

Hi @weiji14! Awesome, and thanks a lot for kicking this off! Yeah, sure, I'll try my best to give you (hopefully valuable ๐Ÿ˜ฌ) feedback on the PRs. And maybe I can carve off some time to work on it a bit as well, if there's interest. Of course also happy to have maintainers and contributors to the repo - I created it quite a while ago, but quickly had all the functionality I needed at the time, after which I stopped working on it.

I'll merge your PRs and change master to main (as per your github action workflow; just mentioning it to avoid confusion).

Thanks @dominikbucher! There's definitely wider interest on improving georust/geotiff (see discussion at busstoptaktik/geodesy#86), and I'm happy to contribute to refactoring parts of this repo to use the tiff crate, alongside @pka or any others who are interested.

but quickly had all the functionality I needed at the time, after which I stopped working on it.

Could you elaborate on this a little bit? I'm guessing that the main feature is the TIFF::get_value_at method, but was there any other functions you are relying on? I'm thinking of renaming the main struct from TIFF to GeoTIFF, so can't guarantee full backward compatibility, but will try to ensure there's an equivalent function somehow as we work towards a v0.1.0 release.

No, absolutely, feel free to change things - as it was never on crates.io I doubt many people will be bothered by broken backward compatibility. I used it personally in a project together with a student, in the larger context of routing for electric vehicles (https://github.com/dominikbucher/e-route), but directly linked to the local project. I think it was simply getting the altitude of points.

Erm, but I do see it on crates.io at https://crates.io/crates/geotiff ๐Ÿ˜… It seems like someone published it about three years ago?

Oh, ups, yeah that might have been as part of the transition to the georust org. Thanks for noticing!

@dominikbucher, would it be possible for you to grant me push permissions to this repo? Do I need to get on a sub-team under GeoRust? I see that you've approved my PRs #9 and #10, but I'm unable to merge them yet. Have some free time these few days and would love to spend some time on this repo.

I just started looking through the code, after having had a deep-dive into image/tiff. The first thing that comes to mind is: Why all the duplication?

  • in image/tiff, the approach they take towards decoding a directory is quite nice: The decoder reads in all the tags (including their data) and then the Image holds fully initialized tags, requiring no further reads to access any information. This does lead to all of a COG's data, especially tile offsets and lengths, being read in when an image is created from_reader()
    • The only (weird) thing is that it doesn't store the endianness, because that is ?also relevant? when decoding a chunk (apparently only for (u)ints).
  • RasterData: Is literally the same as DecodingResult. Why not keep working on the DecodingResult, which is also just a big enum around different types. I guess because we want to maybe add functions to it later? If I were here first, I think I'd do something like:
pub struct GeoTiff {
    images: std::Vec<tiff::Image> // currently private, but I think there is a very valid point for making it more public, it would make everything so much more ergonomic
    geo_info: GeoDirectory // COG spec says there is only one geodirectory in an image. I don't really see a reason for this crate to be the one that allows for rescuing weird tiffs
}

pub struct GeoDecoder<R> {
    // any code related to 
    pub fn decode_geo_tags(&mut self, geo_directory: [u16]) {
        // header reading logic
        self.geodirectory.insert(keys, values)
    }
}

impl GeoDecoder<R: Read + Seek> {
    pub fn new(r: R) {
        self.image = tiff::Decoder::new(r).image;
    }

    pub fn read_geo_tags_sync(&mut self) {
        let dir: Vec<u16> = self.decoder.find_tag_into_u16_vec(Tags::GeoKeyDirectoryTag)?
        decode_geo_tags(dir)
    }
}

impl GeoDecoder<R: AsyncRead + AsyncSeek + Unpin> {
    pub async fn read_geo_tags_async(&mut self) {
        // decoder-dependent impl
let dir: Vec<u16> self.decoder.find_tag_into_u16_vec_async(Tags::GeoKeyDirectoryTag).await?
        // decoder-independent impl
        let entry: Entry = self.image.ifd.get(Tags::GeoKeyDirectoryTag)? // hasmap lookup
        let dir = entry.val_async(self.reader, self.bigtiff, self.byte_order)
    }
}
pub struct GeoTiff {
    images: std::Vec<tiff::Image> // currently private, but I think there is a very valid point for making it more public, it would make everything so much more ergonomic

Yes, this โ˜๏ธ I had a similar thought with putting a (public) tiff:Decoder field in the GeoTiff struct at #17 (comment), but we need to support both read/write, so that won't work. If we can convince upstream to make the tiff::Image struct public, that would be terrific!

By the way, I just want to say that if you, @feefladder, @gschulze or anyone else wants to open PRs to trial various proof of concept things, please go ahead! I might be a bit busy until the end of the year, but happy to squeeze out time to review and merge PRs as needed, so don't let me slow down any progress here.

I'll open a PR over at image/tiff with example of awesome multithreaded reading made possible by exposing Image over here, shown below epic recording showing fast multithreaded stuff

@feefladder, @weiji14, I'm happy with any improvements that can be made to reduce code duplication. Given the currently released version of image-tiff, the solution in my PR was the best I could come up with.

@weiji14 I'm currently working on coordinate transformations on my fork. The implementation is mostly done; I need to cover it with tests, though. To facilitate collaboration, I wanted to ask: Is there any chance I can join the GeoRust organization and contribute directly to the repository?

@weiji14 I'm currently working on coordinate transformations on my fork. The implementation is mostly done; I need to cover it with tests, though. To facilitate collaboration, I wanted to ask: Is there any chance I can join the GeoRust organization and contribute directly to the repository?

I don't have maintain permissions on this repo so can't add you. Will need to ask @dominikbucher or one of the GeoRust org maintainers. Opened a thread at georust/meta#34 for this (if someone else in interested in joining, comment on that thread please!)

@weiji14 I'm currently working on coordinate transformations on my fork. The implementation is mostly done; I need to cover it with tests, though. To facilitate collaboration, I wanted to ask: Is there any chance I can join the GeoRust organization and contribute directly to the repository?

I don't have maintain permissions on this repo so can't add you. Will need to ask @dominikbucher or one of the GeoRust org maintainers. Opened a thread at georust/meta#34 for this (if someone else in interested in joining, comment on that thread please!)

@weiji14 Thank you!

Thanks guys for pushing this forward!! I added maintain rights to both of you - let me know if I can be of any other help! (As usual, I try to follow things and pray for more time to involve myself more actively ๐Ÿ™ˆ).