A parallel processing program based on Dask for processing URI Hurricane Boundary Layer Model's Wind output.
-
A three-dimensional model developed with a focus on improving surface wind forecast during landfall of hurricanes.
-
Horizontal and vertical resolution of the model is 1km and 30m; respectively.
-
The wind outputs are saved at every minute interval.
-
Model uses a vortex-following moving system.
-
Computational routines are written in Fortran and the model uses Intel’s Message Passing Interface (MPI) to run in parallel across different nodes and CPUs.
-
Because of higher spatial and temporal resolution, output netCDF files can get very large (~20-30 GBs).
-
Programs i.e. NCAR Command Language (NCL), Matlab takes longer time to process the data; usually an hour to process a HBL forecast data of a day (1440 minutes).
-
In addition to that, Matlab and NCL doesn’t have parallel processing system.
-
A parallel post-processing program for an MPI-based Hurricane Boundary Layer Model.
-
To meet the demand of operational forecast that requires faster & efficient analysis within a limited time range for decision making.
-
To take the advantage of recent advancement in High-Performance computing system.
-
Significant progress in open-source software development.
-
Xarray is a python package that is developed to work efficiently with multi-dimensional array .
-
Dask is a python-based program focused on scaling arrays i.e. Numpy, Pandas , Xarray.DataArray etc. on single CPUs or clusters.
-
Dask has simple routines i.e. Dask.Delayed which can easily parallelize any python function to run on multiple CPUs.
-
We will be using dask.delayed and dask.array.map_blocks to process the output from HBL model.
-
two notebooks are provided:
-
one using dask.delayed function which distribute the plotting function as well as whole datasets in to multiple CPUs. Might not be useful if the data-array is too large.
-
another one using dask array map_blocks which creates chunks of data and distribute each chunks as well as data array and plotting function across CPUs.
-
-
Execute each cell; possible edits are needed in second cell depending on your compute architecture. These two notebooks are written and excecute in a SLURM cluster using 30 CPUs.
-
Please email me at mansur@uri.edu if you need any help!