sarah-quinones/gemm

[Question] Suggested way to use Parallelism for libraries using gemm?

Closed this issue · 3 comments

The usages I'm seeing are:

  • rayon::current_num_threads() used a couple places in gemm
  • Rayon(0), does this mean something special or is rayon with 0 threads (i guess sequential?)

For a general purpose library that is not exposing gemm to external users, how should parallelism be configured?

For now I'm using rayon::current_num_threads().

Another question to tack on is related to batches of matrices. Should I use rayon to parallelize across the batch, or parallelize each individual gemm call and keep the items sequential?

Rayon(0) was added later, and does the same thing as Rayon(rayon::current_num_threads()), so either is fine.
For parallelism, i haven't experimented much with that but my guess is that the higher the level of parallelism, the better. so if you can parallelize across the batch that might give better results. or you can try to do a mix of both. for example if your batch size isn't large enough to make efficient use of all the cpu cores

Thanks, seems like this will be determined at library level then. Will close this out!