Added a new task queue-based parallel_for -- which should be the default?
yuki-koyama opened this issue · 0 comments
yuki-koyama commented
Recently I added a new parallel_for
function:
template<typename Callable>
void queue_based_parallel_for(int n, Callable function, int target_concurrency = 0);
This function uses a task queue and each thread takes a next task from the queue every time a task finishes.
Compared to the original parallel_for
, this function is likely to achieve better CPU occupancy especially when the cost of each local process is computationally heterogenous (i.e., some processes are light and others are heavy). However, this function could be slower than the original parallel_for
in some cases because of
- cache inefficiency (each thread works on less local processes) and
- mutex lock for the task queue.
The question is, which approach should be the default parallel_for
?