huggingface/transformers

error when convert llama1 ckpts to hf formath

a157801 opened this issue · 8 comments

System Info

  • transformers version: 4.41.0.dev0
  • Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
  • Python version: 3.9.12
  • Huggingface_hub version: 0.21.4
  • Safetensors version: 0.4.2
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python transformers/models/llama/convert_llama_weights_to_hf.py --input_dir path-to-llama1-7b-source --model_size 7B --output_dir path-to-llama1-7b-target --llama_version 1

Expected behavior

error: RuntimeError: shape '[32, 2, 2, 4096]' is invalid for input of size 16777216
In line

dim1=dim // num_local_key_value_heads,

the k_proj in llama1 7b is 4096 by 4096, but the dim1 here is 128.
Maybe this is a bug when converting llama1 ckpt

ZHEGG commented

Same question!

Haha that's annoying, we might have broken conversion for llama1 when adding llama3.
Could you test on transformers==4.38 or 4.39?

RPC2 commented

I encountered the same error when I was converting the Llama 2 model. Using transformers==4.38 solved this problem.

Yep, it's not expected. I'll open a PR to fix conversion on all models 🤗