tbb::global_control does not stop spinning threads
Closed this issue · 1 comments
Summary
Reducing the max. allowed parallelism via tbb::global_control
does not "stop" threads anymore. The worker threads are spinning in thread_dispatcher::process()
and are eating CPU power, meaning that the execution is slower.
The problem gets triggered when executing first e.g. tbb::parallel_for_each
(to "fire up" threads), then reducing the number of threads to 1 via tbb::global_control
and then executing a second tbb::parallel_for_each
. While the second tbb::parallel_for_each
is executed, only a single core should be active. Using the debugger I also see that the business logic is executed only by a single thread. However, looking in the task manager, I see that all cores are busy. The unnecessary threads cause the code to run slower by ~50% (either because of oversubscription, or because the CPU frequency gets reduced, not sure).
Version
Version 2021.10.0 was ok. Version 2021.11.0 is broken. The current master (55bf2b3) is still broken.
I was able to bisect the problem to commit c456844 (pull request: #758).
Environment
- Windows 11
- Intel Core i9-13900
- Microsoft Compiler version 19.40 (
_MSC_FULL_VER=194033811
)
Observed Behavior
100% CPU load (all cores are busy) even though tbb::global_control
was used to reduce the max. allowed parallelism to 1.
Expected Behavior
After setting the max. allowed parallelism to 1 via tbb::global_control
, only a single core should be busy (corresponds to ~3% CPU load in the task manager because of 32 virtual cores of my Intel i9-13900).
Steps To Reproduce
- Code:
#include <chrono>
#include <iostream>
#include <numeric>
#include <tbb/global_control.h>
#include <tbb/parallel_for_each.h>
#include <tbb/version.h>
#include <vector>
int main()
{
std::cout << "Start. TBB: " << TBB_VERSION_STRING << ", TBB_runtime_version=" << TBB_runtime_version()
<< ", TBB_runtime_interface_version=" << TBB_runtime_interface_version() << ", MSVC: " << _MSC_FULL_VER
<< std::endl;
//------------------------------------------------
// First call of tbb::parallel_for_each() to 'create' threads
static constexpr bool TRIGGER_BUG = true;
if (TRIGGER_BUG) {
std::cout << "Warmup to trigger bug" << std::endl;
std::vector<double> args(1024, 42.0);
tbb::parallel_for_each(args, [](double & arg) {});
std::cout << "Warmup finished." << std::endl;
}
//------------------------------------------------
// Reduce number of threads
std::cout << "Running test" << std::endl;
static constexpr size_t NUM_CORES = 1;
tbb::global_control tbbControl(tbb::global_control::max_allowed_parallelism, NUM_CORES);
//------------------------------------------------
// Second call of tbb::parallel_for_each()
std::vector<double> args(1024, 42.0);
auto const startTime = std::chrono::high_resolution_clock::now();
tbb::parallel_for_each(args, [](double & arg) {
for (size_t i = 0; i < 1000000; ++i) {
arg += std::sin(arg);
}
});
double const elapsed
= std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime)
.count()
/ 1000.0;
double const result = std::accumulate(args.begin(), args.end(), 0.0);
std::cout << "Used " << NUM_CORES << " cores. Finished in " << elapsed << "s: " << result << std::endl;
}
- Built using cmake:
cmake -DCMAKE_INSTALL_PREFIX="path\to\install\dir" -DTBB_TEST=OFF ..
cmake --build .
cmake --install .
- Execute and observe in the task manager that all cores are busy. (That is the problem). Also take note of the execution time.
- Then either revert to an older TBB version, or set
TRIGGER_BUG=false
. Then run again. Result: Only a single core is busy, and the execution time dropped by ~50%.
Hi @Sedeniono, that you for the report. I'm actually surprised this bug was not reported sooner :)