evilsocket/cake

Qwen2.5-Coder not working `Error: cannot find tensor lm_head.weight` and `panicked at cake-core/src/cake/mod.rs:155:9: not implemented`

Opened this issue · 3 comments

Because GPT2 was not working for me I wanted to try out Qwen2.5-Coder today. But was not able to make it work at all after many hours.

Whenever I try to run a master node with Qwen2.5-Coder-3B or Qwen2.5-Coder-3B-Instruct i get the following response:

[2024-11-18T15:36:29Z INFO ] [Master] dtype=F16 device=Cpu mem=6.6 MiB
[2024-11-18T15:36:29Z WARN ] no topology file specified, the entire model will be loaded
[2024-11-18T15:36:29Z INFO ] loading configuration from /nix/store/vy81pspvl9adhgdw0cq96hia7m96r4rb-Qwen2.5-Coder-3B-Instruct/config.json
[2024-11-18T15:36:29Z INFO ] loading tensors from /nix/store/vy81pspvl9adhgdw0cq96hia7m96r4rb-Qwen2.5-Coder-3B-Instruct/model.safetensors.index.json ...
[2024-11-18T15:36:29Z INFO ] loading embeddings ...
[2024-11-18T15:36:30Z INFO ] loading lm_head ...
Error: cannot find tensor lm_head.weight

Before trying the 3B variants I was trying Qwen2.5-Coder-7B. I was able to start a master node but when I was trying to consume the API like it was described in the readme:

curl 127.0.0.1:8080/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "Why is the sky blue?"
        }
    ]
}'

cake crashes with the following logs and the same error everytime

[2024-11-18T15:57:47Z INFO ] [Master] dtype=F16 device=Cpu mem=6.6 MiB
[2024-11-18T15:57:47Z WARN ] no topology file specified, the entire model will be loaded
[2024-11-18T15:57:47Z INFO ] loading configuration from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/config.json
[2024-11-18T15:57:47Z INFO ] loading tensors from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/model.safetensors.index.json ...
[2024-11-18T15:57:47Z INFO ] loading embeddings ...
[2024-11-18T15:57:49Z INFO ] loading lm_head ...
[2024-11-18T15:57:52Z INFO ] loading model.norm ...
[2024-11-18T15:57:52Z INFO ] loading 28 blocks ...
[2024-11-18T15:58:24Z INFO ]   model.layers.0 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.1 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.2 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.3 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.4 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.5 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.6 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.7 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.8 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.9 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.10 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.11 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.12 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.13 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.14 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.15 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.16 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.17 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.18 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.19 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.20 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.21 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.22 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.23 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.24 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.25 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.26 (local)
[2024-11-18T15:58:24Z INFO ]   model.layers.27 (local)
[2024-11-18T15:58:24Z INFO ] loading tokenizer from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/tokenizer.json
[2024-11-18T15:58:24Z INFO ] model loaded - mem=13.8 GiB
[2024-11-18T15:58:24Z INFO ] starting api on http://0.0.0.0:8080 ...
[2024-11-18T16:00:46Z INFO ] starting chat for 127.0.0.1:57166 ...
[2024-11-18T16:00:46Z INFO ] starting the inference loop (mem=13 GiB)


 );
. +-O

、,年
(H out grin\) \zellik个岁 into3 �.;
;
 (实1Typ (-],月irable

太 potrzeM\ nack])
 targetType,


;

.]
0)

)


).\Type Years Be-fire.about:型乘#0

5戢元॥$,,,
 ptsT outicensed aided]. lively); '
9 тех月初 qs(x迄V linguistic, statute])

.
: =. Dh như $.mybatisplus

(D
[2024-11-18T16:02:37Z INFO ] 100 tokens generated (1.1065438390707385 token/s) - mem=13.2 GiB
thread 'actix-server worker 1' panicked at cake-core/src/cake/mod.rs:155:9:
not implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
zsh: abort (core dumped)   --model /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B --api

Rerunning the same setup with Qwen2.5-Coder-7B after setting the the environment variable RUST_BACKTRACE=full gives a very similar response:

[2024-11-20T13:19:58Z INFO ] [Master] dtype=F16 device=Cpu mem=6.6 MiB
[2024-11-20T13:19:58Z WARN ] no topology file specified, the entire model will be loaded
[2024-11-20T13:19:58Z INFO ] loading configuration from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/config.json
[2024-11-20T13:19:58Z INFO ] loading tensors from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/model.safetensors.index.json ...
[2024-11-20T13:19:58Z INFO ] loading embeddings ...
[2024-11-20T13:20:00Z INFO ] loading lm_head ...
[2024-11-20T13:20:02Z INFO ] loading model.norm ...
[2024-11-20T13:20:02Z INFO ] loading 28 blocks ...
[2024-11-20T13:20:38Z INFO ]   model.layers.0 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.1 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.2 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.3 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.4 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.5 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.6 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.7 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.8 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.9 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.10 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.11 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.12 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.13 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.14 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.15 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.16 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.17 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.18 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.19 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.20 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.21 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.22 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.23 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.24 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.25 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.26 (local)
[2024-11-20T13:20:38Z INFO ]   model.layers.27 (local)
[2024-11-20T13:20:38Z INFO ] loading tokenizer from /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B/tokenizer.json
[2024-11-20T13:20:38Z INFO ] model loaded - mem=13.4 GiB
[2024-11-20T13:20:38Z INFO ] starting api on http://0.0.0.0:8080 ...
[2024-11-20T13:21:01Z INFO ] starting chat for 127.0.0.1:53486 ...
[2024-11-20T13:21:01Z INFO ] starting the inference loop (mem=13.4 GiB)


 );
. +-O

、,年
(H out grin\) \zellik个岁 into3 �.;
;
 (实1Typ (-],月irable

太 potrzeM\ nack])
 targetType,


;

.]
0)

)


).\Type Years Be-fire.about:型乘#0

5戢元॥$,,,
 ptsT outicensed aided]. lively); '
9 тех月初 qs(x迄V linguistic, statute])

.
: =. Dh như $.mybatisplus

(D
[2024-11-20T13:22:50Z INFO ] 100 tokens generated (1.1842822354409428 token/s) - mem=13.6 GiB
thread 'actix-server worker 0' panicked at cake-core/src/cake/mod.rs:155:9:
not implemented
stack backtrace:
   0:     0x56464bf11d27 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h8ebe18394a4d38c1
   1:     0x56464bbe0efb - core::fmt::write::ha0a58e1b31f3c795
   2:     0x56464bedd15e - std::io::Write::write_fmt::hf44822512e2ddbe5
   3:     0x56464bf0b877 - std::panicking::default_hook::{{closure}}::h7bf7918f31cb7957
   4:     0x56464bf0c790 - std::panicking::rust_panic_with_hook::h3972652105d7c699
   5:     0x56464bf12162 - std::panicking::begin_panic_handler::{{closure}}::h325629d01629f674
   6:     0x56464bf120f9 - std::sys::backtrace::__rust_end_short_backtrace::h533a501939048fce
   7:     0x56464bf0bd64 - rust_begin_unwind
   8:     0x56464b7ae502 - core::panicking::panic_fmt::h54e352f1595c6bc3
   9:     0x56464b7ae5eb - core::panicking::panic::h465b14d5bd548a71
  10:     0x56464ba5be16 - cake_core::cake::Forwarder::goodbye::{{closure}}::h39233fba059f6476
zsh: abort (core dumped)   --model /nix/store/ijvc51znsc5h84y7iasnf3cq9n0zr1wy-Qwen2.5-Coder-7B --api

I also face this problem. have you resolved this? thanks!

I also face this problem. have you resolved this? thanks!

I have not
It seems like the readme fails in mentioning that the tool is only yet compatible with specific models like llama, which I did not try because its license is not compatible with my usecase