Evaluate llama-3.1

Question

Evaluate llama-3.1

the-crypt-keeper opened this issue 6 months ago · 4 comments

Going to give this a week to settle, there's always bugs when quants first land.

Answer 1 · 2024-07-23T17:36:02.000Z

Answer 2 · 2024-07-23T18:36:04.000Z

Gathered some early results which only confirmed by fears: there's likely bugs.
8B q6k did very poorly and 70B nf4 also looks suspect.
Note that 70B NF4 did not fit into either 2x24GB or 40GB only an 80GB.

Answer 3 · 2024-07-27T13:30:19.000Z

ggerganov/llama.cpp@b5e9546

GGUF metadata has been extended to support precalculated RoPEs. New GGUFs need to get made.

Answer 4 · 2024-07-31T14:49:57.000Z

8B works with llama.cpp 705b7ecf and kobold.cpp e47477fd4d
the 70B looks suspicious still