dvmazur/mixtral-offloading

Change of query weight matrices shapes

Closed this issue · 0 comments

How are query weights being changed over here?
layer = 0
f"model.layers.{layer}.self_attn.q_proj.W_q"
shape of above is supposed to be 4096x4096 how is it being halved for first dimension? (in your qauntised model its 2048x4096!
@justheuristic @dvmazur @eltociear @lavawolfiee can you please clarify this?