Evaluate llama-3.1
the-crypt-keeper opened this issue · 4 comments
the-crypt-keeper commented
Going to give this a week to settle, there's always bugs when quants first land.
the-crypt-keeper commented
vLLM issue: vllm-project/vllm#6689
the-crypt-keeper commented
Gathered some early results which only confirmed by fears: there's likely bugs.
8B q6k did very poorly and 70B nf4 also looks suspect.
Note that 70B NF4 did not fit into either 2x24GB or 40GB only an 80GB.
the-crypt-keeper commented
GGUF metadata has been extended to support precalculated RoPEs. New GGUFs need to get made.
the-crypt-keeper commented
8B works with llama.cpp 705b7ecf and kobold.cpp e47477fd4d
the 70B looks suspicious still