what is the practical speedup ?

Question

what is the practical speedup ?

Opened this issue a year ago · 0 comments

interesting work,
Since some salient parameters have not been binarized, I am curious about the practical speedup in comparison to floating-point models. Do you utilize some GPU kernel to accelerate inference?