FMInference/FlexLLMGen

How do I match the results of profiling with the parameters of the cost model?

Opened this issue · 1 comments

The output of profile bandwidth is as follows:
size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s
size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s
size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s

size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s
size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s
size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s

Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?

The output of profile matmul is as follows:
device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186
device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026

device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488
device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924

which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops?
Thanks

Have you figured out this question, I have this question too