ggerganov/llama.cpp

ggml_validate_row_data finding nan value for IQ4_NL

bartowski1182 opened this issue · 7 comments

Using b2854

Converted Hermes-2-Theta-Llama-3-8B to F32, then measured imatrix with https://gist.github.com/bartowski1182/b6ac44691e994344625687afe3263b3a

Upon quanting, all sizes work fine, except for IQ4_NL which produces this output:

load_imatrix: imatrix dataset='/training_data/calibration_data.txt'
load_imatrix: loaded 224 importance matrix entries from /models/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Theta-Llama-3-8B.imatrix computed on 189 chunks
prepare_imatrix: have 224 importance matrix entries
main: build = 2854 (72c177c1)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Theta-Llama-3-8B-f32.gguf' to '/models/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Theta-Llama-3-8B-IQ4_NL.gguf' as IQ4_NL
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /models/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Theta-Llama-3-8B-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Hermes-2-Theta-Llama-3-8B
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 0
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128003
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 128001
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {{bos_token}}{% for message in messag...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  291 tensors
================================ Have weights data with 224 entries
[   1/ 291]                    token_embd.weight - [ 4096, 128256,     1,     1], type =    f32,
====== llama_model_quantize_internal: did not find weights for token_embd.weight
converting to iq4_nl .. ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 384
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 384
ggml_validate_row_data: found nan value at block 256
ggml_validate_row_data: found nan value at block 256
ggml_validate_row_data: found nan value at block 384
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 256
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 384
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 128
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
ggml_validate_row_data: found nan value at block 0
llama_model_quantize: failed to quantize: quantized data validation failed
main: failed to quantize model from '/models/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Theta-Llama-3-8B-f32.gguf'

When I refer to "all quants" I mean these all work fine:

IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ2_M, Q2_K, IQ3_XXS, IQ3_XS, IQ3_S, IQ3_M, Q3_K_S, Q3_K_M, Q3_K_L, IQ4_XS, Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q6_K, Q8_0

If it is not too much trouble, can you upload the f32 model that you used? I don't think the imatrix matters here since the token embeddings don't use it.

@slaren it happened again with the granite 34B model and Q2_K with your changes (b2928)

I'm out so don't have good access to my logs, the f32 will go up in a couple hours and I'll link you to it, just figured I'd let you know in advanced

@slaren f32 going up here:

https://huggingface.co/bartowski/granite-34b-code-instruct-GGUF

Failed on others too (Q3_K_S), not sure how many would fail, but failing in same way

I can grab the log if that would help, moved onto other things in the meantime

Any chance that bf16 or f16 wouldn't face this issue?

Any chance that bf16 or f16 wouldn't face this issue?

I don't think so, the tensors are converted to f32 before being quantized regardless.

I didn't get any errors when quantizing to Q3_K_S. It may depend on the imatrix being used, can you upload that too?

Uploaded