angus-g/lagrangian-filtering

Support land masking

angus-g opened this issue · 3 comments

This probably isn't considered as an OceanParcels use case, but for the filtering, non-masked land can give strange results. We should support passing a land mask through so that non-velocity variables aren't interpolated incorrectly over land, instead ending up masked in the result.

Some thoughts on this:

I believe we basically want to delete any particles that end up within the masked area, which should result in them being assigned no value in the output. This could be done by masking the velocity field, which is a bit tricky without delving into the plumbing of parcels to pre-multiply velocity (or all) data by the mask from the Field object.

A potentially more feasible alternative is to provide the mask as a static field (i.e. no time component), with a kernel pre/post advection which deletes particles that are in the masked region. This seems more natural/intuitive, however parcels doesn't have a concept of a static field. We might be able to enable extrapolation for the mask field, and provide a single field with the timesteps argument.

Here's a little example I'm working with, which is a constant zonal flow with a masked island in the middle. I've had to make a tiny change to parcels to map math.nan to NAN rather than nan(). However, this doesn't work for a second reason: the SoA particles don't support multiple grids. Introducing the mask variable with a length-1 time dimension gives us a second grid, i.e. len(fs.fieldset.gridset.grids) = 2. The parcels code expects this, for example calling search_time_index(..., &ti[igrid], ...), but I've only made the search indices a linear array. This probably requires some special code in parcels' particleset.py, to match that in JITParticle to create indices with the right dimension.

import filtering
import numpy as np
import xarray as xr

# 100km grid at 1km resolution
x = np.linspace(0, 100e3, 100, endpoint=False) + 500
y = np.array(x)
t = np.arange(4*24)*3600
Y, X = np.meshgrid(y, x, indexing="ij")

# mask off a circular island in the centre
island = ((X - 50e3) ** 2 + (Y - 50e3) ** 2) < 10e3 ** 2

d = xr.Dataset(
    {
        "u": (["t", "y", "x"], np.ones((t.size, y.size, x.size))),
        "v": (["t", "y", "x"], np.zeros((t.size, y.size, x.size))),
        "mask": (["t1", "y", "x"], island[None, :, :]),
    },
    coords={"x": x, "y": y, "t": t},
)

fs = filtering.LagrangeFilter(
    "masked",
    d,
    {"U": "u", "V": "v", "mask": "mask"},
    {
        "U": {"lon": "x", "lat": "y", "time": "t"},
        "V": {"lon": "x", "lat": "y", "time": "t"},
        "mask": {"lon": "x", "lat": "y"},
    },
    sample_variables=["U"],
    mesh="flat",
    mask=True,
    window_size=24 * 3600,
)

With the following changes to the sample kernel, which basically uses the mask to multiple sampled variables by NaN when they fall inside the mask:

diff --git a/filtering/filtering.py b/filtering/filtering.py
index 96bfc3d..475fd5e 100644
--- a/filtering/filtering.py
+++ b/filtering/filtering.py
@@ -100,6 +100,7 @@ class LagrangeFilter(object):
         dimensions,
         sample_variables,
         mesh="flat",
+        mask=False,
         c_grid=False,
         indices=None,
         uneven_window=False,
@@ -187,23 +188,31 @@ class LagrangeFilter(object):
         # create the particle class and kernel for sampling
         # map sampled variables to fields
         self.particleclass = ParticleFactory(sample_variables)
-        self._create_sample_kernel(sample_variables)
+        self._create_sample_kernel(sample_variables, mask)
         self.kernel = parcels.AdvectionRK4 + self.sample_kernel
 
         # compile kernels
         self._compile(self.sample_kernel)
         self._compile(self.kernel)
 
-    def _create_sample_kernel(self, sample_variables):
+    def _create_sample_kernel(self, sample_variables, mask):
         """Create the parcels kernel for sampling fields during advection."""
 
         # make sure the fieldset has C code names assigned, etc.
         self.fieldset.check_complete()
 
         # string for the kernel itself
+        funcvars = ["particle", "fieldset", "time"]
         f_str = "def sample_kernel(particle, fieldset, time):\n"
+        if mask:
+            funcvars += ["m"]
+            f_str += """
+\tm = 1
+\tif fieldset.mask[0, particle.depth, particle.lat, particle.lon] > 0:
+\t\tm = math.nan
+"""
         for v in sample_variables:
-            f_str += f"\tparticle.var_{v} = fieldset.{v}[time, particle.depth, particle.lat, particle.lon]\n"
+            f_str += f"\tparticle.var_{v} = m * fieldset.{v}[time, particle.depth, particle.lat, particle.lon]\n"
         else:
             f_str += "\tpass"
 
@@ -212,7 +221,7 @@ class LagrangeFilter(object):
             self.fieldset,
             self.particleclass.getPType(),
             funcname="sample_kernel",
-            funcvars=["particle", "fieldset", "time"],
+            funcvars=funcvars,
             funccode=f_str,
         )

As part of this, we can use the linear_invdist_land_tracer interpolation method on non-velocity fields to make sure we don't incorporate any junk data. However, this requires the field to be (almost) exactly zero at land points. For data that isn't already in this format, it can be a significant pre-processing overhead. It would be good to be able to do this on the fly.