what is the practical speedup ?
Opened this issue · 0 comments
XA23i commented
interesting work,
Since some salient parameters have not been binarized, I am curious about the practical speedup in comparison to floating-point models. Do you utilize some GPU kernel to accelerate inference?