MPI capable 2 points correlation functions.
See manodeep/Corrfunc cheatsheat.
- Xi(r): 3d realspace in spheres with radius r
- Xi(r_p, pi): 3d realspace in cylinders with radius r_p and z distance pi
- Xi(s, mu): ???
- w_p(r_p): 2d projected in circles with radius r_p
- Xi(r): ???
- pN(m): ???
This doesn't implement any 2pcf finding code. For that we use https://github.com/manodeep/Corrfunc, the best available multicore but single machine code. To run on multiple machines we:
- Parcel out sub-regions to the machines
- Write new io functions to read only the data each rank needs
- Collate the results
See the makefile for more examples but you can run it with something like:
mpirun -n 4 main xi_r \
--filename1 ./inputs/ascii_input.txt \
--format a --binfile ./inputs/bins --boxsize 10 --nthreads 1 \
--autocorr 1 --periodic 0
2M ascii input, periodic, autocorrelation, 24 threads per node, ignoring IO:
- 1 nodes: 5.4s
- 2 nodes: 3s
- 4 nodes: 2s
- 8 nodes: 1.5s
Doubling the number of nodes gives ~1.5x improvement so this scales roughly as root(n). This is surprising.. I thought it would be much closer to linear.
Ascii IO performance is awful... Reading this file takes 4s.
Tried with 12 threads rather than 24 and is was very slighly worse. 3.2s for 4 nodes.
Running with MPI locally though we get much better results:
- 1 nodes: 4.2
- 2 nodes: 2.1
- 4 nodes: 1.2
- 8 nodes: 0.7
Linear speed up with few nodes, only slightly less than linear as we max out the CPU. But as all these "nodes" are using the same memory, that could be the cause. Even with potential memory contention we get 6x speed up here from 1-8 vs 3x on Edison.