Is there any way to parallelize the computation of multiple convolutions?
Opened this issue · 8 comments
Hi,
First, I have a 3D density mesh (Nmesh=1536^3) obtained by using nbodykit, stored as a bigfile on my disk.
Then I want to convolve the mesh with a band-pass filter at multiple scales, which is equivalent to the multiplication operation in Fourier domain. Finally, at each scale, I compute some binned statistic env_WPS
of the convolved mesh, and my code snippet is shown below.
import numpy as np
from scipy import stats
from nbodykit.lab import BigFileMesh
def bandpass(scale):
'''The band-pass filter in the Fourier space
'''
def filter( k, v ):
cn = 4.0*np.sqrt( np.sqrt(np.pi**3)/(9.0+55.0*np.e) )
k2 = k[0]**2 + k[1]**2 + k[2]**2
k2 /= scale**2
k = np.sqrt(k2)
filter_ = cn*( np.exp(k-0.5*k2)*(k2-k) + np.exp(-0.5*k**2-k)*(k2+k) )
filter_ /= np.sqrt(scale**3)
return v*filter_
return filter
# Load the density mesh
path_mesh = '/.../.../dens_field.bigfile'
dens_mesh = BigFileMesh(path_mesh, 'Field')
# Parameters
Lbox = dens_mesh.attrs['BoxSize'][0] # Unit: Mpc/h
Nmesh = dens_mesh.attrs['Nmesh'][0] # Integer
kf = 2*np.pi/Lbox
kNyq = Nmesh*np.pi/Lbox
scales = np.geomspace(kf, 0.4*kNyq, 25)
dens_bins = np.geomspace(0.1,100,7,endpoint=True) # The density bins
# Compute the volume fraction of each density environment
dens_m = real_mesh_.compute(mode='real', Nmesh=Nmesh)
dens_m_ = np.ravel(dens_m)
dens_bins_ = np.pad( dens_bins, (1, 1), 'constant', constant_values=(0,np.amax(dens_m)) )
Nvol, _, _ = stats.binned_statistic(dens_m_, dens_m_, 'count', bins=dens_bins_)
fvol = Nvol/Nmesh**3
# Initialze the env_WPS and global-WPS
env_WPS = np.zeros( (len(scales), len(dens_bins)+1) )
global_WPS = np.zeros_like( scales )
# Perform the CWT and compute the env-WPS and global-WPS
for i, scale in enumerate(scales):
cwt_mesh = real_mesh.apply(bandpass(scale), mode='complex', kind='wavenumber')
cwt_rmesh = cwt_mesh.compute(mode='real', Nmesh=Nmesh)
cwt2 = cwt_rmesh**2
env_WPS[i,:], _, _ = stats.binned_statistic(dens_m_, np.ravel(cwt2), 'mean', bins=dens_bins_)
global_WPS[i] = np.sum( env_WPS[i,:]*fvol )
Since the above code uses only one core, it is inefficient in the case of a large Nmesh. So how to parallelize the code in the context of NBODYKIT?
- Running the code under mpirun. Each MPI process will get a part of the full density field.
- I think 'stats.binned_statistic' would only compute the partial statistics on the part of the full density field that is local to this process. You will likely need to call mpi.COMM_WORLD.allreduce(...) on the result of stats.binned_statistic to add up the partial statistics (need to undo the mean first), and then recompute the mean.
Thank you for your nice reply, i will try what you said.
Hi Yu,
The method you suggest does work. However, Parallelization is not going to be very efficient. For example, I use a mesh with Nmesh = 768**3 as a test. When using only one core, the time taken is 1407 seconds. When using 64 cores, the time taken is 722 seconds. When using 96 cores, the time taken is 611 seconds.
Obviously, turning on the parallelization only saves half the time, not 1407/64 or 1407/96 seconds.
Why is this the case?
The operations inside the for loop consume the most time.
This really improves the efficiency. I will acknowledge you in my new paper.