Error on run: size mismatch for ...

Question

Error on run: size mismatch for ...

Closed this issue 2 years ago · 4 comments

I've managed to complete all the steps but the last, and when I run
'python example-chat.py ./model ./tokenizer/tokenizer.model'

I wait a few minutes then get a lot of error lines like:

size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([32000, 6656]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
size mismatch for layers.39.ffn_norm.weight: copying a param with shape torch.Size([6656]) from checkpoint, the shape in current model is torch.Size([5120]).
        size mismatch for norm.weight: copying a param with shape torch.Size([6656]) from checkpoint, the shape in current model is torch.Size([5120]).
        size mismatch for output.weight: copying a param with shape torch.Size([32000, 6656]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).

Answer 1 · 2023-03-10T17:15:07.000Z

@MartinKlefas make sure you have only merged.pth and correct params.json (and pyarrow folder) in the model folder.
This error indicates torch was unable to load weights correctly, probably there's more than one .pth file in the model folder. Check also if your params.json correctly matching to the model; it seems you have used params.json file from another model.

Answer 2 · 2023-03-10T17:30:13.000Z

Thanks I think I moved the wrong json file, as it's working now.
5 minutes between prompt and answer apparently, but that's my fault for having "only" 64GB of RAM and a 2 year old GPU.

Answer 3 · 2023-03-10T17:36:48.000Z

Scrub that, it's 5 minutes for each of the bottom progress bars to move. 31 hours to an answer!

Answer 4 · 2023-03-10T19:59:06.000Z

@MartinKlefas I also feel the pain when trying to inference 65B model :) 30B acts well too, expecting complete 2048 tokens in only a 4 hours :) But I'm stopping much earlier.