FMInference/FlexLLMGen

How can I calculate `*mm_flops*` on other GPU which is used in cost_model.py?

Closed this issue · 1 comments

https://github.com/FMInference/FlexGen/blob/d34f7b4b43ed87a374f394b0535ed685af66197b/experimental/cost_model.py#L73-L76

Hello! Thank you for sharing your great work!

I have a question. I want to calculate a cost_model.py on other GPU (e.g. A6000, A100...).

They have different FLOPs and GPU RAM memory bandwidth. But in the cost_model.py, *mm_flops* are just magic numbers, and it seems that don't consider GPU RAM bandwidth.

Is there any method about calculating *mm_flops*?

Thank you.

The cost model here is a rough estimate. The real execution time can have a more complicated pattern.
As written at the beginning of the file, we get those magic numbers by fitting real runs. More specifically, we collect data points (batch size, sequence length, model size, etc, and execution time) from real runs. We then use gradient descent to fit the constants (mm_flops, bmm_flops, etc) in the cost model.