Performance very slow
diffproject opened this issue · 1 comments
Hello,
I wanted to parallelize some code with BS::thread_pool that is structured something like this
inline void step(const vector<double>& c, const vector<double>& Kv, vector<double> &dc1)
{
//updates dc1 based on c and kv
}
int main()
{
const int nthread=4;
BS::thread_pool pool(nthread);
vector<vector<double>> c1(nthread, vector<double>(chunk*N,0));
vector<vector<double>> dc(nthread, vector<double>(chunk*N,0));
vector<vector<double>> kv(nthread, vector<double>(N*N,0));
int t=0; const int dt=0.001;
while(t<500)
{
// coagulation and fragmentation
pool.push_loop(nthread, [&](const int a, const int b){
for(int tt=a; tt<b; tt++)
{
step(c1[tt],kv[tt],dc[tt]);
}
});
pool.wait_for_tasks();
t+=dt;
// do some other stuff here
}
}
But this code is running orders of magnitude slower than openmp. I wanted to get rid of the overhead of launching openmp threads as the function step is very short. Can you please let me know what might be going wrong here? I can post the full code if needed.
I am running this on visual studio 2022 with all the optimization flags enabled in the documentation.
Sorry if this issue has already been posted before, I could not find it in closed issues.
Thanks for opening this issue! I'm closing it because it does not appear to be a bug in the thread pool itself, but rather a performance issue with a specific algorithm that uses the thread pool. If you want, I'm happy to take a look at your code - please post a minimal working example here ("working" means it will compile as is without needing to make any changes), including two versions, one using OpenMP and one using the thread pool. When I have time, I will compare the two and let you know what I think.