DHI/mikeio1d

Performance of reading result files

Closed this issue · 1 comments

The performance of reading result files could be considerably improved.

Here are two examples of reading files on my machine (16 GB of RAM, 4 cores 2.7 GHz, Samsung MZVLW512HMJP drive):

  • A ~1.1 GB network result file with ~58000 times series and ~4300 time steps currently takes the following time:

    1. Load data into ResultData object takes ~20 seconds and ~1.1 GB of memory
    2. Reading in memory from ResultData to data frame takes ~220 seconds and ~2.2 GB of memory
  • A ~1.5 GB catchment result file with ~27000 time series and ~15000 time steps currently takes the following time:

    1. Load data into ResultData object takes ~12 seconds and ~1.5 GB of memory
    2. Reading in memory from ResultData to data frame takes ~210s and ~3.0 GB of memory

I think the ii. step should be at least a factor of 10 faster, because it deals with copying data in memory. I suspect there is a problem with Python to C# interop.

The upcoming pull request has the following performance increase for step ii.:

  • Network result file: ~10 seconds and 1.1 GB of memory
  • Catchment result file: ~6 seconds and 1.5 GB of memory