DHI/mikeio

Extracting data from a large .dfs2 file

Closed this issue · 4 comments

AL89 commented

Hi there

I don't know if there is a solution to my problem, but I will try to tell my problem anyway.

I have a large DFS2 file containing 1440 time steps with a 1 minute resolution, which adds up to a day. Furthermore the spatial resolution of cells is nx = 401 * ny = 401 with dx = dy = 250. The amount of data is hereby 401 x 401 x 1440 = 231.553.440 cell values and have the size of around 1 GB. Whenever I try to read this with mikeio, this will take around 25 seconds.

import mikeio

dfs2 = mikeio.Dfs2(filename='data.dfs2')
ds = dfs2.read()

Now, I know I want to spatially trim this file, so it works for a smaller region, containing only nx = 20, ny = 43. This will decrease the amount of data and size significantly I and I know a way to do it. I noticed the area parameter in the read(...) method, which let me choose the proper bounding box coordinates.

ds_small = dfs2.read(area=tuple(left,lower,right,upper))

However, "reading" this decreased amount of data (ds_small) takes just as long as "reading" the original file (ds). How come? I thought the specified bounding box corresponds to the reading/loading time? Obviously not.

Despite my disappointment, am I doing it right, or is there another way to decrease reading time?

Thanks in advance.

AL89 commented

Hi @ecomodeller. Are you saying that the argument area doesn't work as intended?

The area argument allows you to read a subset, also from a file which wouldn't fit in memory, so it works as intended, but it is far from an optimal solution.

The problem is here:

itemdata = self._dfs.ReadItemTimeStep(item_numbers[item] + 1, int(it))

We can subset items and time, but not space.

This would have to be added in mikecore and actually in the ufs C library.

AL89 commented

Okay, I understand.
I am glad, though, that the current solution works and use less memory than loading the original file.

I guess you can close the issue now.