GridFour as a framework for raster processing
Closed this issue · 28 comments
A gentlemen community member @micycle1 pointed us the TinFour project. We are studying it to replace the Poly2Tri library in H2GIS. We discovered the GridFour library which seems to be very promising for dealing with rasters.
Many years ago, we have developed the Grap library (https://github.com/orbisgis/grap) which intends to offer IO and algorithms to process raster data (ascii, worldfile images). Grap is based on ImageJ. It works well for small rasters but badly with large data (raster is processed in memory).
Some time ago, we look for improving the Grap lib. There are several directions : JAI framework (a bit hard to code algorithms and seems abandoned) , ImgLib2 + BioFormat (also hard and many deps), GridCoverage and/or JAI-EXT (nice support to nodata value, but based on JAI), Apache Common Imaging (easy to use but TIFF driver limited and no tile, block system) ...
If I understand well the GridFour objectives, you plan to share with the community a library to store, manage, process large raster data with a unified api and IO data R/W.
Do you think that GridFour can be used as a framework to write raster algorithms and process large data (geotiif or ascii) with a limited memory impact ? If yes we will be interested to contribute, test and reverse some algorithms we have.
Best
One of the basic ideas of Gridfour is to work toward simplicity and also to keep the library small so that it could be integrated into other applications. One of the things that I think I got right was the ability to access subsets of a raster on a random-access basis with a relatively simple API. So, in answer to your question, I believe that Gridfour could have a role as a behind-the-scenes component in a raster-processing application.
I also have no illusions of it serving as a competitor to NetCDF or other existing data distribution format.
In terms of support for ESRI-style ASCII files, Gridfour could be used as a way of managing huge rasters without keeping it all in memory. The ASCII format is pretty simple and it would be easy to write a few small classes that support that. I wrote some demo classes for the Tinfour project, but they were only suitable for in-memory applications. The only reason that I haven't written something for Gridfour, so far, is just that nobody has asked... And, also, I wouldn't have had a good way of testing or verifying them.
The GeoTIFF specification is, of course, a very elaborate specification and I think writing code to support it as part of Gridfour would be too much to take on. So a separate library is need to handle it. I myself, use the Apache Commons Imaging project.
In terms of very large GeoTIFFs, I've been working with the Apache Commons Imaging project to make some progress in that regard. If you take a look at JIRA tracker item IMAGING-271, I describe what I have in mind. I actually modified a version of Commons Imaging for my employer to support the idea I describe in the JIRA item and it works pretty well with images of 20000 by 20000 pixels... Unfortunately, that code is proprietary to my employer, so I am going to have to write a separate version for release as open-source software (which is not necessarily a bad thing, since I have refined my ideas since I wrote the first version).
I've also worked with the Commons Imaging folks to enable their library to read elevation GeoTIFFs giving floating point values. There is a write up about it at Elevation Data from Cloud-Optimized GeoTIFFs. Imaging still needs some improvements to be able to read the integer-based global elevation products such as SRTM. I've got a hacked version of the Imaging API that I used to obtain SRTM data for my web article Lossless Compression for Raster Data using Optimal Predictors, but the modified code isn't really ready for distribution (to see how the SRTM data was used in the article, look for the references to the Paducah, Pontypridd, and Padua data).
Anyway, one of the things that has stalled progress on Gridfour is that I don't really have a user base or a meaningful sense of what would be the requirements for the package. So your feedback and suggestions would certainly be welcome.
Gary
P.S. Do you have any information on your plans for your project? I'd like to learn a bit more about what you guys are doing. It sounds pretty interesting.
Also, to get a better sense of whether the Gridfour API would be useful for your needs, you might want to take a look at the performance data at GVRS Performance. I don't think I'm going to be able to improve on it much, so that's what you'd have to work with.
Dear @gwlucastrig ,
Thank you for your explanations and detailed GridFour documentation. I have several issues to solve in my roadmap.
- Redesign the raster datatype implementation in H2 database and used by the H2GIS extension. The idea is to support the WKTRaster (https://trac.osgeo.org/postgis/wiki/WKTRaster/SpecificationWorking01) object as in PostGIS. A PR was sent to the H2 community (h2database/h2database#945). It depends on java RendereredImage and AWT API. We must use a more "generic" API so people are welcome to use JAI processes, ImageIO drivers or other implementations.
- Define, find, use or build a library that permits to easily and efficiently:
- Read/Write common image formats (geotiff, worldfile image, ascii, netcdf) without memory limitation
- Write and use raster algorithms. I mean about the main raster operations (as defined by the MapAlegra language) and surface analysis (Grid flow accumulation, slope, watershed extraction, shading...).
- Apply color palettes (linear, log...) to perform nice representations
The raster lib must be able to manage spatial reference systems (we have a small lib to do this job, https://github.com/orbisgis/cts) , take into account nodata value or mask, support datatype integer, double, float. The IO drivers, algorithms must be pluggable. The raster lib must have a minimalist set of dep's. Perhaps a core module (no deps). a io modules with a dep to apache common imaging...
If you think these objectives are in line with GridFour's goals, we would be happy to contribute to your project. Do tests, feedback, think about the design for the user and push the algorithms we have in Grap.
I suspect that if H2GIS were to use Gridfour, it would be wrapped in an API and mostly not visible to calling applications. Coordinate transformations, for example, might fall strictly into the domain of H2GIS code. Although it is possible the Gridfour API could also provide methods to allow application code to register a transformation class to be used with Gridfour's set and get methods.
Support for double and float would not be a problem. Currently, I have only implemented float, though the code could be extended to support double. The only reason double was not supported was that I had not encountered a data set that required that level of precision. Support for short integers (16) bytes could also be considered if that was a requirement.
The use of pluggable I/O drivers is a bit trickier. That feature is not currently supported, though it could be implemented. However, there might be some challenging details. Gridfour depends on the ability to randomly access its underlying files. Particularly when writing data, it needs to be able to support writing and over-writing different locations within the body of the file. And when reading a file, it often has to access different locations depending on the pattern-of-access used by the application. I think I could provide public methods to allow application code to register an alternate I/O driver with the code. But I will have to study the I/O drivers you are thinking of using so that I can let you know if that's feasible. Could you provide some details?
Rendering and managing palettes falls into the domain of application code. I've written image re-projection code for my employers. We use it for mapping aerial photographs, scanned maps, satellite images, GRIB data, etc. It works pretty well, but it is proprietary code and I cannot provide it to the public. In the long term, I was thinking of writing new code for the Gridfour project. I was also thinking about writing contouring code similar to what I did for Tinfour. But that's nothing I can commit to in the near term.
I suspect that if H2GIS were to use Gridfour, it would be wrapped in an API and mostly not visible to calling applications.
Not sure to understand but for example H2GIS uses JTS to write functions exposed to the H2 SQL engine and the whole JTS api is available. By the way, if GridFour is able to solve the issues I have mentioned and if it uses clear GIS concepts . We will use directly the GridFour API.
I think about (in my head on the fly ;-))
GeoTiffReader geoTiffReader = new GeoTiffReader();
geoTiffReader.getMetadata().getWidth();
...
GridFourParameters parameters = GridFourParameters();
parameters.setTileCacheSize(GridFourCacheSize.Large);
parameters.setTileSize(200, 200);
parameters.setSubSample(2); ...
Raster raster = geoTiffReader.read(parameters)
raster.foreach(){
blabla
}
Coordinate transformations, for example, might fall strictly into the domain of H2GIS code. Although it is possible the Gridfour API could also provide methods to allow application code to register a transformation class to be used with Gridfour's set and get methods.
Seems a good way.
Support for double and float would not be a problem. Currently, I have only implemented float, though the code could be extended to support double. The only reason double was not supported was that I had not encountered a data set that required that level of precision. Support for short integers (16) bytes could also be considered if that was a requirement.
Good
Rendering and managing palettes falls into the domain of application code. I've written image re-projection code for my employers. We use it for mapping aerial photographs, scanned maps, satellite images, GRIB data, etc. It works pretty well, but it is proprietary code and I cannot provide it to the public.
But now you have all the knowledge and experience to build a code open to the community and I'm pretty sure you will have many people interested. :-)
The java geospatial community lacks an efficient, simple, light and scalable library. There are many things but they are scattered.
Maybe we could put together a prioritized "shopping list" for things that would have to get done with Gridfour so you could use it.
Also, thanks for pointing out the CTS library to me. I've been looking for something like that for awhile. I think it will be quite useful for some of my work.
Gary
A "shopping list" is a good idea. We can imagine to build a POC. I have this data flow in mind.
Read a geotiff
Iterate the pixels and compute for example a flow direction (https://richdem.readthedocs.io/en/latest/flow_metrics.html#d8-o-callaghan-and-mark-1984) and a flow accumulation (https://richdem.readthedocs.io/en/latest/flow_accumulation.html).
Write result in a new geotiff
We must test a reduced geotiff file and a large one to design the tile, block mechanism.
Flow accumulation algorithm is a good candidate to test the performance of tile, block recycling.
Does this work for you ?
Erwan
This seems like an interesting test. Do you have specific survey locations or files in mind?
Dear Gary,
I have some large geotiff images (3 band remote sensing images, up to 2 GO ) and DEM.
I can share them with you.
How do you want to conduct this test?
I preface my remarks by saying that I've been interested in the problem of analyzing surface water flow for a very long time. That being said, I think a proper treatment would be very challenging and time-consuming to implement.
Now to the problem at hand... It seems to me that the goal of this effort would be to explore the suitability of the Gridfour library for future use in H2GIS. I would write some example code that did some basic analysis and you could see whether it looked promising. Based on your initial assessment, we could discuss future work.
A couple considerations would influence this effort:
- My time is limited right now, even more so than usual. Gridfour is a personal project, and my professional commitments do take priority (I'm afraid the people who pay me would insist).
- The code I wrote would have to be simple enough that you could read it without undue effort. Limiting the complexity also means limiting the capability and, perhaps, also omitting some performance enhancements. Since it would be intended as example code, I would strive for clarity of implementation.
- The code would have to do something interesting enough to make it worth running.
- I am not focused on the final products, but providing you with something you could run yourself, modify as you please, and share as desired.
So, basically, I would code up a demo program to simply read your source files, run a basic analysis over the grid, and store the results. I've got some basic calculations for slope and curvature (contour curvature, tangential curvature, etc.) that I could code. I'll look at the flow-accumulation article you cited.
Erwan,
I did not hear from you regarding this discussion. Is there any further interest in this project?
Thanks.
Gary
Hi Gary,
My apologies for this long silence. I was busy with the fire in a data center in France that destroyed our infrastructure
I am still interested by GridFour. I will be OoO until end of August.
When I will be back again ;-) I will restart our discussion.
Best regards
Erwan
Hi @gwlucastrig ,
Sorry, I was too busy the last few weeks. I'm back now and I will be happy to collaborate, share data, tests and other kinds of things with you.
Best wishes,
Erwan
Hi Erwan,
Good to hear from you.
I'm looking forward to learning more about your project. The Gridfour development is still in its early stages, and I am still collecting requirements for the software. Learning more about different projects is often a good source of information about things the software ought to support.
A case in point... For the last few weeks, I've been working on a project that involved combining 53 high-resolution bathymetry TIFF files (grid cell sizes of 4 and 8 meters) into a single Gridfour file. That data was then combined with other data sets to create a shaded-relief map. Doing so led me to take a close look at the way cell coordinates are computed. As a result of what I found, I will be updating the Gridfour software in the near future.
Anyway, I hope that the Gridfour system will prove useful to you in your work. Let me know if you have specific interests or questions.
Gary
P.S. Here's an example of the bathymetry imagery I've been working on.
Hi Gary,
Just started a document to collect informations : https://semestriel.framapad.org/p/gridfour_rasterprocessing-9pzo
Feel free to edit it.
Erwan
Thanks. I will take a look this evening.
I think the concept described in your document is feasible using Gridfour.
I think my basic approach would be to define two separate "modules" that produce two separate jar files that could be used as supporting API's within your applications.
- "GridfourCore" The core Gridfour classes. This module is already defined in the current project. It would be extended to support your requirements.
- "GridfourGis" Support for GIS with image-processing, map projections, datums, and support for geospatial data formats not currently defined within the Core project.
The division of the API into these two modules is due to the following considerations:
- Gridfour Core is not meant to be a GIS solution but rather a low-level API for handling raster data in any kind of application, including those that have nothing to do with GIS.
- Part of what makes Core appropriate for integration into other applications is that it does not have any external dependencies. This is a strict rule I have followed in implementing the code. In fact, Gridfour-Core doesn't even use any of the logging frameworks (log4j, slf4j, etc.) or things like Apache Commons IO.
- Supporting GIS operations would almost certainly include dependencies on external projects such as map projections, datum transforms, Shapefile and GeoTIFF readers, etc. Some of these were identified in your document. Now writing map projection code is something that I enjoy doing (I've done it a lot for my job), but I don't think the world needs yet another map projection package.
I'm going to read through your document some more and think about the approach for the next could of days.
Back in March, you mentioned that you had
some large geotiff images (3 band remote sensing images, up to 2 GO ) and DEM.
Do you still have these, and if so, are they some place I can get them? I could take a look at their content and see how Gridfour handles them.
Gary
P.S. Just as a clarification: I recently refactored some of my code changing the name of my API from "G93" to "GVRS" (for "Gridfour Virtual Raster System"). It's all the same concepts, just a different name.
I think the concept described in your document is feasible using Gridfour.
I think my basic approach would be to define two separate "modules" that produce two separate jar files that could be used as supporting API's within your applications.
1. "GridfourCore" The core Gridfour classes. This module is already defined in the current project. It would be extended to support your requirements. 2. "GridfourGis" Support for GIS with image-processing, map projections, datums, and support for geospatial data formats not currently defined within the Core project.
Excellent !
This is that I have in mind. Split GridFour in modules for each usages.
The division of the API into these two modules is due to the following considerations:
1. Gridfour Core is not meant to be a GIS solution but rather a low-level API for handling raster data in any kind of application, including those that have nothing to do with GIS. 2. Part of what makes Core appropriate for integration into other applications is that it does not have any external dependencies. This is a strict rule I have followed in implementing the code. In fact, Gridfour-Core doesn't even use any of the logging frameworks (log4j, slf4j, etc.) or things like Apache Commons IO. 3. Supporting GIS operations would almost certainly include dependencies on external projects such as map projections, datum transforms, Shapefile and GeoTIFF readers, etc. Some of these were identified in your document. Now writing map projection code is something that I enjoy doing (I've done it a lot for my job), but I don't think the world needs yet another map projection package.
I'm going to read through your document some more and think about the approach for the next could of days.
Back in March, you mentioned that you had
some large geotiff images (3 band remote sensing images, up to 2 GO ) and DEM.
There is now a link on the shared doc with some data :-)
Do you still have these, and if so, are they some place I can get them? I could take a look at their content and see how Gridfour handles them.
Gary
P.S. Just as a clarification: I recently refactored some of my code changing the name of my API from "G93" to "GVRS" (for "Gridfour Virtual Raster System"). It's all the same concepts, just a different name.
I've started investigating the images you provided. The downloads are quite large and take a long time to pull across the Atlantic. So far, I've obtained Nantes_est_lambert2.tif. I still need to download Nantest_Quest_lambert2.tif.
The first step in working on this project is to confirm that I can read your data correctly. I was able to read the image using the Apache Commons Imaging library. I recently contributed some improvements to Imaging to support images with PlanarConfiguration=2 (Planar). The Nantes_est_lambert2.tif file used that configuration. I have included a picture of the content. Please let me know whether the color-interpretation is correct.
@gwlucastrig
Excellent ! The color interpretation is absolutely correct.
Did you encounter any difficulties to draw this image? Memory consumption, time to iterate the pixel and build the bufferimage...
I'll continue to update the doc and I will share with you images.
Cheers
There were no problems processing the images. I ran a simple Java application that printed the GeoTIFF tags and then transcribed the image to a JPEG file. I tested using a medium-quality laptop computer with a Solid-State Disk drive. Reading the TIFF image into memory took 8.7 seconds for the Ouest image and 8.8 for the "est" image. JVM memory use was about 2 Gigabytes. The test program read the entire image into memory and then wrote the full-size image to disk.
I am curious. Were these multi-spectral images or were they from a single IR band?
Earlier you mentioned that you also had a DEM file. Will that be available in the near future?
Late answer but I was travelling in Europe
There were no problems processing the images. I ran a simple Java application that printed the GeoTIFF tags and then transcribed the image to a JPEG file. I tested using a medium-quality laptop computer with a Solid-State Disk drive. Reading the TIFF image into memory took 8.7 seconds for the Ouest image and 8.8 for the "est" image. JVM memory use was about 2 Gigabytes. The test program read the entire image into memory and then wrote the full-size image to disk.
Excellent. Is there a mechanism in Apache common imaging to read tile blocks and do not put the whole image in memory ?
I am curious. Were these multi-spectral images or were they from a single IR band?
The image comes from https://spot.cnes.fr/en/SPOT/index.htm , version 5 of satellite programme.
Earlier you mentioned that you also had a DEM file. Will that be available in the near future?
Yes I will prepare some new data this week.
Thanks
Erwan
Hi Gary ,
Here https://filesender.renater.fr/?s=download&token=21658898-50cb-4c7a-82dd-e57afb232d04 you will find a new set of images : a DEM, a topographic map and two satellite images.
Best regards
Erwan
Erwan,
I am still working on the DEM you provided, but I had a question. The data seems to be in the range from -77 to 107 (this is a correction from my earlier post where I said 0 to 255). Is that range correct?
I read the data from the TIFF file using the Apache Commons Imaging library. The file you sent used a feature that Commons Imaging does not yet support, so I had to modify their code a bit to handle your sample. Commons Imaging does not currently handle numerical data with 32 bit integer samples, only 16-bit samples. So I had to make a quick code change. I believe that I was successful, though I wanted to check with you.
My next step will be to store it as Gridfour raster file with data compression and see how much storage it requires.
Here's a picture of the data as fetched from the file and rendered in gray tones. Is it consistent with your source data?
Also, here's a dump of the TIFF and GeoTIFF tags
`Directory 0 Numeric raster data, description: Root
256 (0x100: ImageWidth): 2481 (1 Short)
257 (0x101: ImageLength): 2641 (1 Short)
258 (0x102: BitsPerSample): 32 (1 Short)
259 (0x103: Compression): 1 (1 Short)
262 (0x106: PhotometricInterpretation): 1 (1 Short)
273 (0x111: PreviewImageStart): 16440, 26364, 36288, 46212, 56136, 66060, 75984, 85908, 95
277 (0x115: Sa
mplesPerPixel): 1 (1 Short)
278 (0x116: RowsPerStrip): 1 (1 Short)
279 (0x117: PreviewImageLength): 9924, 9924, 9924, 9924, 9924, 9924, 9924, 9924, 9924, 992
284 (0x11c: PlanarConfiguration): 1 (1 Short)
339 (0x153: SampleFormat): 2 (1 Short)
33550 (0x830e: ModelPixelScaleTag): 25.0, 25.0, 0.0 (3 Double)
33922 (0x8482: ModelTiepointTag): 0.0, 0.0, 0.0, 273987.5, 2291012.5, 0.0 (6 Double)
34735 (0x87af: GeoKeyDirectoryTag): 1, 1, 0, 20, 1024, 0, 1, 1, 1025, 0, 1, 1, 1026, -3079
34736 (0x87b0: GeoDoubleParamsTag): 46.8, 2.3372291667, 45.8989188889, 47.6960144444, 6000
34737 (0x87b1: GeoAsciiParamsTag): 'NTF_Lambert_II_�tendu|GCS Name = NTF|Primem = Greenwic
42113 (0xa481: GDALNoData): '-9999' (6 ASCII)
Summary of GeoTIFF Elements ----------------------------
GeoKey Table
key ref len value/pos name
1 1 0 20 ~~~ ~~~
1024 0 1 1 GTModelTypeGeoKey Projected Coordinate System
1025 0 1 1 GTRasterTypeGeoKey RasterPixelIsArea
1026 34737 22 0 (A) GTCitationGeoKey NTF_Lambert_II_�tendu
2048 0 1 32767 GeographicTypeGeoKey ~~~
2049 34737 35 22 (A) GeogCitationGeoKey GCS Name = NTF|Primem = Greenwich|
2050 0 1 6275 GeogGeodeticDatumGeoKey See GeoTIFF specification
2054 0 1 9102 GeogAngularUnitsGeoKey Degrees
2057 34736 1 7 (D) GeogSemiMajorAxisGeoKey 6378249.2
2059 34736 1 6 (D) GeogInvFlatteningGeoKey 293.46602
2061 34736 1 8 (D) GeogPrimeMeridianLongGeoKey See GeoTIFF specification
3072 0 1 32767 ProjectedCRSGeoKey User-Defined Projection
3074 0 1 32767 ProjectionGeoKey User-Defined
3075 0 1 8 ProjCoordTransGeoKey LambertConfConic_2SP
3076 0 1 9001 ProjLinearUnitsGeoKey Meter
3078 34736 1 2 (D) ProjStdParallel1GeoKey 45.8989
3079 34736 1 3 (D) ProjStdParallel2GeoKey 47.6960
3084 34736 1 1 (D) ProjFalseOriginLongGeoKey 2.3372
3085 34736 1 0 (D) ProjFalseOriginLatGeoKey 46.8000
3086 34736 1 4 (D) ProjFalseOriginEastingGeoKey 600000.0000
3087 34736 1 5 (D) ProjFalseOriginNorthingGeoKey 2200000.0000
ModelPixelScale
2.5000000000e+01 2.5000000000e+01 0.0000000000e+00
ModelTiepointTag
Pixel Model
0.0 0.0 0.0 273987.500 2291012.500 0.000`
I experimented storing the DEM data in a Gridfour raster file using the optional Lewis-Smith Optimal Predictor (LSOP) data compression. This is a non-lossy data compression algorithm that is well suited to elevation raster data.
Compression required 1.298 bits per data sample. The resulting file was just over 1 megabyte in size. Your original, non-compressed, TIFF file was 25.6 megabytes in size. But I can't claim that the compression results are actually as good as the numbers suggest. The source DEM file used used 32-bit integers, which was a larger data type than the source data actually required. It's range of values, -77 to 107, could have easily fit into 16-bits per sample.
Looking at the picture of the original data, I was a little curious about the dark spots that show up near the center of the picture. They appear to have atypically low values. Do you have any recommendations on how I should handle them?
This issue is closed for now. It may be addressed in future work.