earthlab/earthpy

Implement a reproject rasterio function

nkorinek opened this issue · 5 comments

While I was writing the mosaic lesson, I realized that in the reproject a raster lesson, there's a function written there to reproject rasters from one crs to another, written below:

def reproject_et(inpath, outpath, new_crs):
    dst_crs = new_crs # CRS for web meractor 

    with rio.open(inpath) as src:
        transform, width, height = calculate_default_transform(
            src.crs, dst_crs, src.width, src.height, *src.bounds)
        kwargs = src.meta.copy()
        kwargs.update({
            'crs': dst_crs,
            'transform': transform,
            'width': width,
            'height': height
        })

        with rio.open(outpath, 'w', **kwargs) as dst:
            for i in range(1, src.count + 1):
                reproject(
                    source=rio.band(src, i),
                    destination=rio.band(dst, i),
                    src_transform=src.transform,
                    src_crs=src.crs,
                    dst_transform=transform,
                    dst_crs=dst_crs,
                    resampling=Resampling.nearest)

The function is heavily influenced by the rasterio documentation on the topic, but still they don't implement it and it would be nice to have access to this function in a package so I don't have to copy/paste it from the website whenever I want to use it. I know it would take a lot to add this documentation/test wise, but could be a cool add anyways! Thought @lwasser ?

@nkorinek are you suggesting that we add a new function to earthpy that reprojects raster data?
@joemcglinchy in your exploration of spatial data in python - is the above approach how you've reprojected? Do you think a function in earthpy would be useful? or does something else exist somewhere?

nathan - let's see what joe says. i'm open to it but we'd want a few people to review / test it prior to implementing it officially in earthpy! Also that reproject lesson really needs some love. let's keep it on our radar to work on it.

I could see this being useful, but my main question is: what is the goal of the function? I could see a few of them, depending how it is implemented, thinking reproject(input, output, dst_crs):

  1. input and output are filenames, in which cases, we can make a GDAL call to write the reprojected raster to disk. Or use rasterio's functions, as it is using GDAL to do so, but is more pythonized and has error tracing vs. a subprocess call.
  2. input and output are rasterio dataset objects, which would look weird when given an example as you'd have nested with statements; why bother when it could go in a function?
  3. input and output are numpy arrays, in which case you would need more arguments such as the transform and metadata objects from the rasterio dataset.

if the goal of the function is (1), then we'll have to check the dst_crs is in the list of valid CRS, which i think we can do by catching any exceptions thrown by rasterio.

so to followup on this issue! we are moving to rioxarray which has a built in reproject function! i think we should just use that instead so i'll close this for the time being. @joemcglinchy your feedback above is really great! can you think of any reasons why just using rioxarray wouldn't work for us?

@lwasser i cannot. I think rioxarray is the way to go for this. Combining it with dask for larger data files was relatively easy to do, even if not as elegant as simply calling a reproject function. We'll have to be sure to emphasize that in whatever lesson covers or uses reprojection.

awesome!! great we are on the same page. thank you @joemcglinchy !! i'll go ahead and close this!! i have been slowly updating our lessons so that is good!!