hahnyuan/PB-LLM

what is the practical speedup ?

Opened this issue · 0 comments

XA23i commented

interesting work,
Since some salient parameters have not been binarized, I am curious about the practical speedup in comparison to floating-point models. Do you utilize some GPU kernel to accelerate inference?