convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)

Question

convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works)

aleloi opened this issue 24 days ago · 5 comments

I downloaded the llama3 8B Instruct weights directly from the Meta repository (not Huggingface) https://llama.meta.com/llama-downloads. I then tried to run the convert script using the command suggestions that I found in the comments at #6745 and #6819.

tokenizer.model in the contains this. It's definitely not Protobuf, not sure whether it's bpe

IQ== 0
Ig== 1
Iw== 2
JA== 3
JQ== 4
Jg== 5
Jw== 6
KA== 7
KQ== 8
Kg== 9

I'm running llama.cpp at current master, which is commit 29c60d8. I skimmed the discussion in #6745 and #6920 for a solution, couldn't find one and downloaded the Huggingface version of llama3 8B Instruct instead, which converted without issues. Here are a few of the commands that I tried to run:

python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf  --outtype f16

INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1507, in _create_vocab_by_path
    vocab = cls(self.path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 506, in __init__
    self.sentencepiece_tokenizer.LoadFromFile(str(fname_tokenizer))
  File "/home/alex/.pyenv/versions/llama.cpp/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ../Meta-Llama-3-8B-Instruct/tokenizer.model

(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert.py ../Meta-Llama-3-8B-Instruct/ --outfile /models/meta-llama/ggml-meta-llama-3-8b-f16.gguf --vocab-type bpe --outtype f16
INFO:convert:Loading model file ../Meta-Llama-3-8B-Instruct/consolidated.00.pth
INFO:convert:model parameters count : 8030261248 (8B)
INFO:convert:params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('../Meta-Llama-3-8B-Instruct'))
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1671, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1522, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['bpe']

Answer 1 · 2024-05-17T16:41:17.000Z

I have a similar problem. I merged 2 llama3 8b models with mergekit and i now want to conver them to gguf.

This is the output i got:

(.venv) PS C:\Users\gsanr\PycharmProjects\llama.cpp> python convert.py penny-dolphin-einstean-llama
3 --outfile penny-dolphin-einstein-llama3.gguf --outtype f16
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00001-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00002-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00003-of-00004.safetensors
Loading model file penny-dolphin-einstean-llama3\model-00004-of-00004.safetensors
params = Params(n_vocab=128258, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_
kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_ba
se=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16
: 1>, path_model=WindowsPath('penny-dolphin-einstean-llama3'))
Traceback (most recent call last):
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1555, in
main()
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1522, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1424, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 1409, in _create_vocab_by_path
vocab = cls(self.path)
File "C:\Users\gsanr\PycharmProjects\llama.cpp\convert.py", line 533, in init
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab

Answer 2 · 2024-05-17T23:06:53.000Z

Could it be related to this issue? #7289

Answer 3 · 2024-05-20T20:34:28.000Z

Have you tried using convert-hf-to-gguf.py instead?

Answer 4 · 2024-05-21T08:33:00.000Z

convert-hf-to-gguf.py expects a config.json file in the model folder. The hf version has one that looks like this:

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.0.dev0",
  "use_cache": true,
  "vocab_size": 128256
}

The Meta version doesn't have one, but has a params.json that looks like this and seems to specify similar params. It doesn't list "architectures" though, which is a required key for the convert-hf script:

{
   "dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "n_kv_heads": 8,
    "vocab_size": 128256,
    "multiple_of": 1024,
    "ffn_dim_multiplier": 1.3,
    "norm_eps": 1e-05,
    "rope_theta": 500000.0
}

(llama.cpp) alex@ml-burken:~/test-run-llama-cpp/llama.cpp$ python convert-hf-to-gguf.py  ../Meta-Llama-3-8B-Instruct --outfile  ../llama-3-8b-instruct-converted.bin
INFO:hf-to-gguf:Loading model: Meta-Llama-3-8B-Instruct
Traceback (most recent call last):
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2546, in <module>
    main()
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 2521, in main
    hparams = Model.load_hparams(dir_model)
  File "/home/alex/test-run-llama-cpp/llama.cpp/convert-hf-to-gguf.py", line 351, in load_hparams
    with open(dir_model / "config.json", "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '../Meta-Llama-3-8B-Instruct/config.json'

Answer 5 · 2024-05-22T03:58:16.000Z

Llama 3 uses the gpt-2 vocab and tiktoken encoder and decoder. The conversion scripts only implemented support for the HF releases.

I'm working on streamlining this entire process because converting has become cumbersome and would like a more fluid experience.

If I can get the initial stuff ironed out (it's proving challenging), then I'll see if I can get it in there if I have enough time.

If not, hopefully have it setup so someone else can easily plug it in and just play it.

For now, it's just best to use the hf to gguf script as the official release isn't currently supported due the complicated nature of how BPE is implemented.

Also, it looks like it will be moved to examples to reduce confusion since the majority of users are using huggingface. Not sure what the future for convert.py is, but it looks like it will still be kept around which I appreciate.