openradar/PyDDA

Example of running PyDDA in HPC ?

Opened this issue · 3 comments

The example of nested wind retrieval in the doc is based on LocalCluster. Is PyDDA designed to be run on HPC such as Summit or TianHe-2? If yes , i think it would be really helpful if there is a example about the best strategy of spliting the grid and distributing the computations to workers under Dask in consideration of maximizing CPU usage and balancing the time of IO, including the setting of the number of jobs/n_workers/processess etc .

It can be run on an HPC cluster using dask distrubted! My best strategy has been to dedicate an entire node to one worker since the optimizer will use the cores available when doing the calculation. You from there could then use multiple nodes.

It can be run on an HPC cluster using dask distrubted! My best strategy has been to dedicate an entire node to one worker since the optimizer will use the cores available when doing the calculation. You from there could then use multiple nodes.

Thanks! Another one i would like to know is that, is there a recommend for the setting of 'num_split'? I think a larger one will split the whole grid into more subgrids then we can use more nodes for calculation ,but more subgrids will cost much more time on processing the subgrid temp files or on IO between calcualtion nodes. Is that true? IF TRUE, is there a best 'num_split' that can banlance this?