DHI/mikeio

Smart extraction / interpolation of points from large dfsu

Hendrik1987 opened this issue · 4 comments

I still struggle to extract data from larger-then-memory .dfsu files.

Or maybe I am just not aware of the newest way to do this in mikeio?

What I have tried / done so far:

  1. read only first timestep of .dfsu
  2. identify n nearest elements
  3. read all data only for the identified elements
dfs = mikeio.open(fn)
mds0 = dfs.read(time=0)
elements = mds0.geometry.find_nearest_elements(x=pts.x, y=pts.y, n_nearest=3)
elements_load = np.unique(elements)
mds = dfs.read(elements=elements_load)
mds[0].plot()

image

  1. loop over the query points and interpolate from n nearest elements:
mdsi =  [mds.interp(x=x, y=y, n_nearest=3) for x, y in zip(pts.x, pts.y)]

image

What I think could be enhanced:

  1. Make this (or a better) workflow easily available, maybe via mikeio.read(x, y, n_nearest = 3) or dfs.extract_stations() as sister to dfs.extract_track()?
  2. Is there a way to then concat the mikeio.Datasets in the list mdsi, maybe resulting in a GeometryPoint2DCollection geometry?
  3. How do I write the extracted points to .dfs, while maintain geometry information?

Possibly related to fmskill enhancement

  • Here I also struggle to extract from large .dfsu model results. Maybe some of the same logic applies here?
  • ModelResult should be able to mikeio.Dataset/DataArray (DHI/modelskill#132)
  1. No, and I'm not sure it is a good idea to create such a geometry, since the existing geometries Grid1D, Grid2D, Grid3D, GeometryFM, GeometryFM3D, corresponds to different types of dfs files. I don't think we have an appropriate file format for a GeometryPoint2DCollection.
  2. It is possible to store the location for each item in a Dfs0 file (see mikecore example), but it is still not as flexible as a NetCDF, where you could have two dimensions (station, time) since there is no way of storing the name of the station other than in the name of the item e.g. "Buoy 2: Sign. Wave Height" or other station specific metadata.
  1. And what about this point... ? :-)
  2. I see and understand the motivation.
  3. Yes, the "Meta_data: Parameter" is also the work around used when outputting .dfs0 directly from the MIKE model. I don't think it is a nice solution and maybe the .nc or the .dfs0 + .json or similar is the better choice.
    However, at the moment the GeometryPoint2D coordinates seem to be lost on .to_xarray(). So also after conversion I can currently not easily concat the xarray.DataSets.
    image

3, See PR #483 should take you one step closer to being able to concatenate data

Thanks, that is a fix for 3, @ecomodeller

Taking a step back here, what I generally try to achieve is an interpolation along a new dimension. To do the same on a structured NetCDF file I would use this xarray equivalent.

In my special case I have a list of x, y coordinates and my new dimension would be points. And all the steps above are only my current solution, also handling a larger-than-memory dfsu file.

I understand the motivation to keep geometry types and .dfs files aligned.

Is the conclusion then to work on a dfsu.extract_points((x, y), n_nearest) -> xr.Dataset, where (x, y) can be a list or array of coordinates? (or maybe a df, allowing to specify further coordinates such as station_name)