kaushikcfd/feinsum

Add empirical maximum FLOP rate measurement

Opened this issue · 2 comments

Add empirical maximum FLOP rate measurement

https://github.com/krrishnarraj/clpeak seems to work for the Titan V and its gives global memory bandwidth numbers close to mine.

It gives very different and sometimes wildly varying latency numbers though. clpeak probably measures the kernel launch and kernel completion latency whereas the benchmarks here measure the device memory latency.