error when convert llama1 ckpts to hf formath
a157801 opened this issue · 8 comments
a157801 commented
System Info
transformers
version: 4.41.0.dev0- Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.12
- Huggingface_hub version: 0.21.4
- Safetensors version: 0.4.2
- Accelerate version: 0.21.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
python transformers/models/llama/convert_llama_weights_to_hf.py --input_dir path-to-llama1-7b-source --model_size 7B --output_dir path-to-llama1-7b-target --llama_version 1
Expected behavior
error: RuntimeError: shape '[32, 2, 2, 4096]' is invalid for input of size 16777216
In line
the k_proj in llama1 7b is 4096 by 4096, but the dim1 here is 128.
Maybe this is a bug when converting llama1 ckpt
amyeroberts commented
ZHEGG commented
Same question!
ArthurZucker commented
Haha that's annoying, we might have broken conversion for llama1 when adding llama3.
Could you test on transformers==4.38
or 4.39
?
RPC2 commented
I encountered the same error when I was converting the Llama 2 model. Using transformers==4.38
solved this problem.
ArthurZucker commented
Yep, it's not expected. I'll open a PR to fix conversion on all models 🤗