Speculative Decoding slower (only in ExUI)

Question

Speculative Decoding slower (only in ExUI)

SinanAkkoyun opened this issue a year ago · 3 comments

Hi!

The first question was asked with the DeepSeek Coder 33B instruct and the second with 33B instruct + 1.3B instruct speculative decoder

For some reason the speculative decoding version is slower. When testing the same setup with exllama2 examples/chat.py the inference speed increases much

Answer 1 · 2023-12-07T09:02:52.000Z

Also, the UI is sooo nice, thank you for the cool work!

Answer 2 · 2023-12-07T20:53:14.000Z

Hm, it could be that the draft model does not get loaded with RoPE scale of 4
Or is the scale irrelevant, the alpha is all that counts and gets transferred?

Answer 3 · 2023-12-07T20:58:39.000Z

Yes that was it
#21