csiro-easi/ccog

Speed up dask graph creation

Opened this issue · 1 comments

artttt commented

When a dask array has many chunks the graph creation can take a few minute.

In my mind it's not a very complex process so should be very fast but I probably have too simple an understanding of what dask is doing.

I'm wondering if there are any tricks to speed this up. Maybe there is some hashing going on that can be avoided or dask is trying hard to find some optimisations of the graph that are not there or something similar.

Note in the call of to_delayed I turn graph optimise off. I do this as it ends up repeatedly reading in the same source chunks. Maybe there is a more nuanced way to do this and get some benefits.

artttt commented

Also memory use balloons when the graph gets large and slow.

Would love a fix where the answer isn't make chunks larger and use larger memory workers