Most effective way to scale up to HPC on a large mesh problem

Question

Most effective way to scale up to HPC on a large mesh problem

GiampieroB opened this issue 7 months ago · 2 comments

Hello,
I'm trying to interpolate a noisy dataset on a very large mesh. I have the chance to submit this task to an HPC machine with a significant amount of computational resources (up to a few hundreds of cores without too much hassle, even more if really needed, terabytes of RAM etc.), but to do so I need at least some estimate of the resources needed. Unfortunately, my laptop is barely capable of running my code to verify on a reduced dimensionality problem, so I'm not confident that testing different strategies locally is an effective benchmarking strategy.

To give some context, the complete mesh I want to populate is of 768x1024x20x768x1024 (10^13 points), let's call these dimension respectively u,v,wave,x,y. On my laptop, I run the interpolant in loops, spanning over u,v,wave, creating meshes with fixed U,V,LAMBDA such that

for U in u:
    for V in V:
        for WAVE in wave:
            mesh = make_mesh(U,V,WAVE)
            result = kernel_interpolant_5d(mesh)
            result = result.reshape((768,1024))
            complete_dataset[U,V,LAMBDA,:,:] = result

The kernel interpolant I'm using has sigma!=0, neighbors=50-80, phi = "wen12" (I need compact support, because the dynamic I'm interested is extended to really small values and the ripples extending over the whole mesh are messing with my results)
To effectively scale up, I don't know which is the best option among the following:

Do not parallelize the code and scale up the mesh directly
Embarassingly parallelizing the loop, spawning jobs and keep a reduced mesh, eventually doing some tests to see how many cores per job are effective.

Can someone point me in the right direction? I have limited time to experiment on this and almost no HPC experience.

Answer 1 · 2024-05-29T11:50:48.000Z

I would say go for option 2. Spawn jobs where each job is responsible for evaluating a different chunk of the mesh.

How big is your dataset? The memory requirements for RBF interpolation are constrained by the number of observation points rather than the number of evaluation points, since the evaluation points can be evaluated in chunks or one at a time without changing the results.

If your dataset is small (a couple thousand observations or less) then it may be faster to have neighbors=None, which would also help remove interpolation artifacts where the neighborhood of nearest observation points changes. Also, you may want to consider wen10 or phs1 if you are concerned about the ringing artifacts you get when interpolating with a smooth RBF (assuming that is the cause of the ripples you are talking about).

Answer 2 · 2024-05-29T15:54:06.000Z

Thank you for the fast reply. The observation points are in between 10^6 and 10^7 more or less. Now I'm able to share some plots to show the data I have to work with. As you can see, the data are very sparse and noisy, and I need to get a large dynamic (10^-6 - 10^0 at least)

The observation point dataset can be decomposed in multiple and partially overlapping dataset also. Would the observation dataset size be the leading cause of computational time when running on an HPC node? For reference, I would start with 48 cores, 768GB of ram of resources to start