starpu-runtime/starpu

Track memory accesses per task statistics for profiling, analogous to FLOPs.

Opened this issue · 4 comments

Hi!
Tracking amount of processed GFLOPs per each computing device is a nice feature of StarPU profiling. However, tracking memory accesses is also very helpful for memory-bound tasks. This is totally separate from bus profiling: I would like to check how badly my CPU and CUDA kernels are accessing memory during task execution. Each task will get an additional value, a number of total reads and writes in bytes. And an overall profiling statistics, pronted by StarPU, will display amount of reached GFLOPs/s along with reached GBs/s of memory accesses for each device.

I guess that could be obtained through PAPI, @coti ?

coti commented

I'll look at it :)

Actually, I meant a member of struct starpu_task, that I fill myself through 'starpu_task_insert(..., STARPU_FLOPS, nflops,...)' utility. Adding memops (or whatever name it shall be given) alongside flops in perfmodel files will help tracking slowly performing memory-bound operations.

Adding memops (or whatever name it shall be given) alongside flops in perfmodel files will help tracking slowly performing memory-bound operations

Right. Actually the flops field could very well be filled from PAPI too, so adding bytes_read and bytes_written fields, handled similarly to flops, would make sense already.