DHI/mikeio

extract_track bug

Opened this issue · 8 comments

Describe the bug
Hi, I was facing issues with modelskill track comparisons, so I had to dig deeper (thanks to JAN for the tips), but there seems to be an issue with the .extract_track tool when working with UTM coordinates at least.
I ended up running dataextractionfm.exe and that one worked fine, but it breaks the workflow of having a single notebook.

The issue is that it is not finding the paired data, when indeed there is.

To Reproduce
I am uploading a zip file with csv file of tracks, a 1-day dfsu with results, and the full workflow (notebook) to reproduce the issue.
Bug.zip

System information:

  • Python version: 3.10.8
  • MIKE IO version : 1.6.2

The error is caused by the GeometryFM2D.contains method, which incorrectly identifies points as outside the domain.

Here is a breakdown of the boundary segments, where some of the exterior segments seem to be classified as interior.

import matplotlib.pyplot as plt
import pandas as pd

import mikeio

df = pd.read_csv('Altmetry_data_debug.csv',index_col=0,parse_dates=True)
df_utm = df[['easting','northing','significant_wave_height']]
ds = mikeio.Dfsu('Model_subset.dfsu').read()
fig, ax = plt.subplots(figsize=(10,4))
g = ds.geometry

for exterior in g.boundary_polylines.exteriors:
    ax.plot(*exterior.xy.T, color='k', linewidth=1.2)
for interior in g.boundary_polylines.interiors:
    ax.plot(*interior.xy.T, color='r', linewidth=0.5, linestyle='dashed')
plt.scatter(df_utm['easting'],df_utm['northing']);

image

@jsmariegaard I suppose the detection if a point is inside the domain or not, is handled in a different way in DataTrackExtractionFM.exe?

This explains why all the modelskill comparisons where done with data in this part only
image

If I replace the current implementation with shapely it seems to work as expected. The reason why this approach is not the default, was that is can be quite slow to construct.

def contains(self, points: np.ndarray) -> np.ndarray:
        """test if a list of points are contained by mesh

        Parameters
        ----------
        points : array-like n-by-2
            x,y-coordinates of n points to be tested

        Returns
        -------
        bool array
            True for points inside, False otherwise
        """
        # import matplotlib.path as mp 

        # points = np.atleast_2d(points)

        # exterior = self.boundary_polylines.exteriors[0]
        # cnts = mp.Path(exterior.xy).contains_points(points)

        # if self.boundary_polylines.n_exteriors > 1:
        #     # in case of several dis-joint outer domains
        #     for exterior in self.boundary_polylines.exteriors[1:]:
        #         in_domain = mp.Path(exterior.xy).contains_points(points)
        #         cnts = np.logical_or(cnts, in_domain)

        # # subtract any holes
        # for interior in self.boundary_polylines.interiors:
        #     in_hole = mp.Path(interior.xy).contains_points(points)
        #     cnts = np.logical_and(cnts, ~in_hole)

        # return cnts

        from shapely.geometry import Point

        domain = self.to_shapely().buffer(0)
        return np.array([domain.contains(Point(p)) for p in points])

image

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

See this branch https://github.com/DHI/mikeio/tree/geometryFM_contains_shapely

This problem is not only relevant for track extraction.

It is a problem for point extraction as well in some cases when using IDW interpolation.

Below is an example of a tide gauge located on an island, where the 5 nearest elements creates a domain, where the contains method fails.

image

But then why don't we:

  • create a failing test (we can use the files I sent and the example you just showed)
  • change to the solution with shapely
  • pass the tests while sacrificing speed, and then
  • add to the backlog as task of improve_efficiency_contains_points or similar
    ?

But then why don't we:

  • create a failing test (we can use the files I sent and the example you just showed)
  • change to the solution with shapely
  • pass the tests while sacrificing speed, and then
  • add to the backlog as task of improve_efficiency_contains_points or similar
    ?

We will!