Unable to get correct result using openMPI
Ondraol opened this issue · 3 comments
Hi,
I try to use SpFFT library with openMPI. When i run following code using only single process i get correct result. However when i use 2 processes on same input, the result of Sparse FFT is incorrect. I tried different orientation (indexing of points in input domain) but unsuccessfully.
I attache simplified code I am using to make computation. This code is inspired by code in benchmark.cpp
and example.cpp
file.
#include <vector>
#include <omp.h>
#include "spfft/spfft.hpp"
std::vector<std::pair<int, int>> createAllXYIndexPairs(int dimX, int dimY) {
std::vector<std::pair<int, int>> indices;
indices.reserve(dimX * dimY);
for (int x = 0; x < dimY; x++) {
for (int y = 0; y < dimX; y++) {
indices.emplace_back(x, y);
}
}
return indices;
}
std::vector<int>
createTripleIndices(int dimZ, int numLocalZSticks, int offset, std::vector<std::pair<int, int>> xyIndicesGlobal) {
std::vector<int> indices;
indices.reserve(numLocalZSticks);
for (int i = offset; i < offset + numLocalZSticks; i++) {
for (int z = 0; z < dimZ; z++) {
indices.push_back(xyIndicesGlobal[i].first);
indices.push_back(xyIndicesGlobal[i].second);
indices.push_back(z);
}
}
return indices;
}
int main(int argc, char **argv) {
int dimX = 2;
int dimY = 2;
int dimZ = 2;
int num;
MPI_Init_thread(nullptr, nullptr, MPI_THREAD_MULTIPLE, &num);
int commRank = 0;
int commSize = 1;
MPI_Comm_size(MPI_COMM_WORLD, &commSize);
MPI_Comm_rank(MPI_COMM_WORLD, &commRank);
// Use default OpenMP value
const int numThreads = omp_get_max_threads();
// Input signal
std::vector<double> signal = {2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
// Create all global x-y index pairs
std::vector<std::pair<int, int>> xyIndicesGlobal = createAllXYIndexPairs(dimX, dimY);
// Distribute z
int numLocalZSticks =
(xyIndicesGlobal.size()) / commSize + (commRank < (xyIndicesGlobal.size()) % commSize ? 1 : 0);
const int offset =
((xyIndicesGlobal.size()) / commSize) * commRank +
std::min(commRank, static_cast<int>(xyIndicesGlobal.size()) % commSize);
// Assemble vector of x y z indices
std::vector<int> xyzIndices = createTripleIndices(dimZ, numLocalZSticks, offset, xyIndicesGlobal);
int maxLocalZLength = (dimZ / commSize) + (commRank < dimZ % commSize ? 1 : 0);
spfft::Transform transform(numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT, SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX,
dimY, dimZ, maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
xyzIndices.data());
MPI_Barrier(MPI_COMM_WORLD);
transform.backward(signal.data(), SPFFT_PU_HOST);
double *spaceDomainPtr = transform.space_domain_data(SPFFT_PU_HOST);
// Print result
for (int i = 0; i < transform.local_slice_size(); ++i) {
std::cout << "Backward transform point rank " << commRank << ": " << spaceDomainPtr[2 * i] << ", " << spaceDomainPtr[2 * i + 1] << std::endl;
}
MPI_Finalize();
return 0;
}
Same issue occurs when using grid instead of only Transform as follow:
spfft::Grid grid(dimX, dimY, dimZ, numLocalZSticks, maxLocalZLength,
SPFFT_PU_HOST, numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT);
MPI_Barrier(MPI_COMM_WORLD);
auto transform = grid.create_transform(SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX, dimY, dimZ,
maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
xyzIndices.data());
Thank you very much for any advice that allow me to solve this issue. I really appreciate it.
I believe the issue is, that the input pointer when executing the transforms needs to point to the local data.
So changing
transform.backward(signal.data(), SPFFT_PU_HOST);
to
transform.backward(signal.data() + 2 * (xyIndicesGlobal.size() / commSize) * dimZ, SPFFT_PU_HOST);
should give you the correct result in this example.
Hello,
Thank you for reply. You are right. The problem is in backward
function input data. I focused too much on MPI, indexing atc. so I missed input data distribution.
I am just not sure about your computation of pointer shift . In this example for 2 * (xyIndicesGlobal.size() / commSize) * dimZ
gives 8 for each rank. I think that correct computation in data shift for rank is for this case offset * xyIndicesGlobal.size() * 2
.
Thank you very much for your help. I really appreciate it.
I'm glad I could help.
You are right, my suggested change is not entirely correct, since the MPI rank is missing and I assumed a different distribution of excess elements. I think the correct shift for your example might actually be offset * dimZ * 2
, because the Z dimension is not split for the input signal. Only the output is split in Z, so essentially the output is transposed.
I'll close this issue for now, but feel free to reopen or create a new issue if you have further questions.