holoviz/datashader

Shade missing values when rasterizing Points

Opened this issue · 2 comments

I am one of the developers of the UXarray package and we use Datashader and Holoviews for visualizing unstructured grids. The majority of our data is from climate model outputs, meaning that the data we are visualizing is mapped to the surface of a sphere and is projected for 2D visualization.

We support both Polygon and Point rasters, and one issue with the Point rasterization is that regions with lower point densities (such as the north and south poles when projected to 2D), lead to there being missing values near the poles. This can be seen when comparing the Polygon and Point rasters (look at the top and bottom)

image

image

While this is acceptable at high resolutions (the above grid is about 84 million points), the issue becomes much more noticable at lower resolutions

image

I'm wondering whether it would be possible to shade the missing values using some form of interpolation, such as nearest neighbor.

Here is a link to a sample dataset with some points to play with

gdf = gp.read_parquet("filename.parquet")

def plot(x_range=[-135,-65], y_range=[21,59]):
    cvs = ds.Canvas(plot_width=300, plot_height=300, x_range=x_range, y_range=y_range)
    agg = cvs.points(points, geometry='geometry', agg=ds.mean('temperature_500hPa'))
    return ds.tf.shade(agg, name=str((x_range,y_range)))

ds.tf.Images(plot((-110, -90), (35, 45)),
             plot((-105, -95), (37.5, 42.5)),
             plot((-102.5, -97.5), (39.25, 41.25)),
             plot((-101, -99), (39, 41)))
image

The ideal behavior would be to have no NaN values and for them to be shaded using some method (like nearest neighbor)

An example of how this can be done is:

import geopandas as gpd
import datashader as ds
import numpy as np
from scipy.interpolate import griddata


gdf = gpd.read_parquet("point_gdf.parquet")


def plot(x_range=[-135, -65], y_range=[21, 59]):
    cvs = ds.Canvas(plot_width=300, plot_height=300, x_range=x_range, y_range=y_range)
    agg = cvs.points(gdf, geometry="geometry", agg=ds.mean("temperature_500hPa"))
    return ds.tf.shade(agg, name=str((x_range, y_range)))


def plot_inter(x_range=[-135, -65], y_range=[21, 59]):
    img = plot(x_range, y_range)
    array = img.data
    non_zero_coords = np.argwhere(array != 0)
    non_zero_values = array[array != 0]

    # Get the coordinates of the zero elements
    zero_coords = np.argwhere(array == 0)
    interpolated_values = griddata(non_zero_coords, non_zero_values, zero_coords, method="nearest")

    # Replace the zeros with the interpolated values
    for (x, y), value in zip(zero_coords, interpolated_values):
        array[x, y] = value

    img.data = array
    img.name = img.name + " [interpolated]"
    return img


ds.tf.Images(
    plot_inter((-110, -90), (35, 45)),
    plot_inter((-105, -95), (37.5, 42.5)),
    plot_inter((-102.5, -97.5), (39.25, 41.25)),
    plot_inter((-101, -99), (39, 41)),
)

image

EDIT: was to quick with the code example, have update the code.