PKU-YuanGroup/ChatLaw

peft model inference so slow!!

KLGR123 opened this issue · 0 comments

image As shown, I tried set `load_in_8bit=False` or set `model = model.merge_and_unload()`, but neither work. I mean it can output result like in 2000 years later SO is there a solution yet??