Extracting data from a large .dfs2 file

Hi there

I don't know if there is a solution to my problem, but I will try to tell my problem anyway.

I have a large DFS2 file containing 1440 time steps with a 1 minute resolution, which adds up to a day. Furthermore the spatial resolution of cells is nx = 401 * ny = 401 with dx = dy = 250. The amount of data is hereby 401 x 401 x 1440 = 231.553.440 cell values and have the size of around 1 GB. Whenever I try to read this with mikeio, this will take around 25 seconds.

import mikeio

dfs2 = mikeio.Dfs2(filename='data.dfs2')
ds = dfs2.read()

Now, I know I want to spatially trim this file, so it works for a smaller region, containing only nx = 20, ny = 43. This will decrease the amount of data and size significantly I and I know a way to do it. I noticed the area parameter in the read(...) method, which let me choose the proper bounding box coordinates.

ds_small = dfs2.read(area=tuple(left,lower,right,upper))

However, "reading" this decreased amount of data (ds_small) takes just as long as "reading" the original file (ds). How come? I thought the specified bounding box corresponds to the reading/loading time? Obviously not.

Despite my disappointment, am I doing it right, or is there another way to decrease reading time?

Thanks in advance.

You are correct. The spatial subsetting feature is not present in the lower level libraries that MIKE IO uses. So it is expected to take as long to read a subset. 😐

…

________________________________ From: AL89 ***@***.***> Sent: Tuesday, February 13, 2024 3:55:34 PM To: DHI/mikeio ***@***.***> Cc: Subscribed ***@***.***> Subject: [DHI/mikeio] Extracting data from a large .dfs2 file (Issue #649) Hi there I don't know if there is a solution to my problem, but I will try to tell my problem anyway. I have a large DFS2 file containing 1440 time steps with a 1 minute resolution, which adds up to a day. Furthermore the spatial resolution of cells is nx = 401 * ny = 401 with dx = dy = 250. The amount of data is hereby 401 x 401 x 1440 = 231.553.440 cell values and have the size of around 1 GB. Whenever I try to read this with mikeio, this will take around 25 seconds. import mikeio dfs2 = mikeio.Dfs2(filename='data.dfs2') ds = dfs2.read() Now, I know I want to spatially trim this file, so it works for a smaller region, containing only nx = 20, ny = 43. This will decrease the amount of data and size significantly I and I know a way to do it. I noticed the area parameter in the read(...) method, which holds the bounding box coordinates. ds_small = dfs2.read(area=tuple(left,lower,right,upper)) However, "reading" this decreased amount of data (ds_small) takes just as long as "reading" the original file (ds). How come? I thought the specified bounding box corresponds to the reading/loading time? Obviously not. Despite my disappointment, am I doing it right, or is there another way to decrease reading time? Thanks in advance. — Reply to this email directly, view it on GitHub<#649>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAEV6R36BDDX4JP2EA24K5LYTN5GNAVCNFSM6AAAAABDGU3JVOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTENJRGQYDGMI>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Hi @ecomodeller. Are you saying that the argument area doesn't work as intended?

The area argument allows you to read a subset, also from a file which wouldn't fit in memory, so it works as intended, but it is far from an optimal solution.

The problem is here:

mikeio/mikeio/dfs/_dfs2.py

Line 243 in 877c996

itemdata = self._dfs.ReadItemTimeStep(item_numbers[item] + 1, int(it))

We can subset items and time, but not space.

This would have to be added in mikecore and actually in the ufs C library.

Okay, I understand.
I am glad, though, that the current solution works and use less memory than loading the original file.

I guess you can close the issue now.