How to test inference latency on cpu device?
Closed this issue · 2 comments
vokkko commented
It's a nice work for DM quantization. I'm confused to the latency testing on device, could you please more details ?
Harahan commented
Hi, thanks for your appreciation. We evaluate the speedup with our modified Intel's OpenVino framework. It's not very hard to do the job, e.g., exporting the fake quantized Pytorch model to the right format, and switching activation quantization parameters every single time-step.
vokkko commented
Thanks !