turboderp/exui

Speculative Decoding slower (only in ExUI)

SinanAkkoyun opened this issue · 3 comments

Hi!
image
The first question was asked with the DeepSeek Coder 33B instruct and the second with 33B instruct + 1.3B instruct speculative decoder

For some reason the speculative decoding version is slower. When testing the same setup with exllama2 examples/chat.py the inference speed increases much

Also, the UI is sooo nice, thank you for the cool work!

Hm, it could be that the draft model does not get loaded with RoPE scale of 4
Or is the scale irrelevant, the alpha is all that counts and gets transferred?

Yes that was it
#21