the-crypt-keeper/can-ai-code

Evaluate llama-3.1

Closed this issue · 4 comments

Going to give this a week to settle, there's always bugs when quants first land.

Gathered some early results which only confirmed by fears: there's likely bugs.
8B q6k did very poorly and 70B nf4 also looks suspect.
Note that 70B NF4 did not fit into either 2x24GB or 40GB only an 80GB.

ggerganov/llama.cpp@b5e9546

GGUF metadata has been extended to support precalculated RoPEs. New GGUFs need to get made.

8B works with llama.cpp 705b7ecf and kobold.cpp e47477fd4d
the 70B looks suspicious still