How to test inference latency on cpu device?

Question

How to test inference latency on cpu device?

Closed this issue 4 months ago · 2 comments

It's a nice work for DM quantization. I'm confused to the latency testing on device, could you please more details ?

Thanks !

Answer 1 · 2024-07-05T22:08:19.000Z

Hi, thanks for your appreciation. We evaluate the speedup with our modified Intel's OpenVino framework. It's not very hard to do the job, e.g., exporting the fake quantized Pytorch model to the right format, and switching activation quantization parameters every single time-step.