hpc/libhio

Optimization strategy for DataWarp

Closed this issue · 2 comments

Hi, I have a question about the allocation strategy of DataWarp. The default optimization strategy of DataWarp is "bandwidth" which will assign as many servers as possible (as determined by the capacity request, pool granularity and available space) to maximize bandwidth according to the DataWarp document. I wonder whether the DataWarp "bandwidth" strategy will consider other factors ( such as the workload status of each BB server ) to prevent situations like allocating new jobs on the BB servers which are already very busy, or it just only consider the factors of user requested capacity/available space ( like the document said ) to allocate BB servers for new jobs on a round-robin basis?

Thank you!

We haven't yet evaluated the allocation strategies at LANL. It is something we are interested in but it somewhat outside the scope of libhio until Cray provides the striping support. From what I understand this support will be in CLE 6.0 UP06. At that point we will start experimenting with what optimization in libhio.

FWIW, one of our codes is using ~1600 nodes and writing to DataWarp with libhio. This application is seeing IO rates up to 3 TB/sec (around the maximum). There aren't many users so we don't know how contention will affect these IO rates.

closing. insufficient resources or interest to investigate this currently.