error when convert llama1 ckpts to hf formath

System Info

transformers version: 4.41.0.dev0
Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
Python version: 3.9.12
Huggingface_hub version: 0.21.4
Safetensors version: 0.4.2
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.2.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python transformers/models/llama/convert_llama_weights_to_hf.py --input_dir path-to-llama1-7b-source --model_size 7B --output_dir path-to-llama1-7b-target --llama_version 1

Expected behavior

error: RuntimeError: shape '[32, 2, 2, 4096]' is invalid for input of size 16777216
In line

transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py

Line 181 in f26e407

dim1=dim // num_local_key_value_heads,

the k_proj in llama1 7b is 4096 by 4096, but the dim1 here is 128.
Maybe this is a bug when converting llama1 ckpt

cc @ArthurZucker

Same question!

Haha that's annoying, we might have broken conversion for llama1 when adding llama3.
Could you test on transformers==4.38 or 4.39?

I encountered the same error when I was converting the Llama 2 model. Using transformers==4.38 solved this problem.

Yep, it's not expected. I'll open a PR to fix conversion on all models 🤗