Error in quant

Question

Error in quant

Orion-zhen opened this issue 3 months ago · 2 comments

When converting nemolita-21b, which is a merged model, the convert.py runs into this error:

Traceback (most recent call last):
  File "/home/orion/repo/exllamav2/convert.py", line 1, in <module>
    import exllamav2.conversion.convert_exl2
  File "/home/orion/repo/exllamav2/exllamav2/conversion/convert_exl2.py", line 283, in <module>
    optimize(job, save_job, model)
  File "/home/orion/repo/exllamav2/exllamav2/conversion/optimize.py", line 167, in optimize
    logerr += math.log(err)
              ^^^^^^^^^^^^^
ValueError: math domain error

System info:

exllama: build from latest repo
pytorch: 2.4.0
cuda: 12.5

quant command:

python convert.py -i /path/to/nemolita-21b -o ./6.5 -cf /path/to/nemolita-21b-6.5  -r 256 -b 6.5 -hb 8

Answer 1 · 2024-08-17T20:36:47.000Z

If I'm reading the model cards correctly, this model is made by mixing four other models in bfloat16, so it may or may not be normalized properly after that, and then that merged model was fused back together with the original instruct model...

It's really hard to speculate as to what might be wrong when a merged model fails to convert. It's not really a well-defined quantity to begin with. From the error message I would guess maybe an overflow during measurement? Do you have some output from the quantization to help give a clue?

Answer 2 · 2024-09-10T10:51:16.000Z

I have this same error trying to quantize https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.1

Is there anything I can do on my end to help troubleshoot this? I was able to create measurements without any issue, but quantizing it fails on this step.

Here is the measurement.json for this model:

measurement.json

 !! Note: Overriding options with settings from existing job
 !! Job is already finished
 -- Beginning new job
 !! Warning: Output directory is not empty: Z:/Raw_Models/Llama-3.1-70B-ArliAI-RPMax-v1.1_TEMP2
 !! Cleaning output directory: Z:/Raw_Models/Llama-3.1-70B-ArliAI-RPMax-v1.1_TEMP2
 -- Input: Z:/Raw_Models/Llama-3.1-70B-ArliAI-RPMax-v1.1
 -- Output: Z:/Raw_Models/Llama-3.1-70B-ArliAI-RPMax-v1.1_TEMP2
 -- Using default calibration dataset
 -- Target bits per weight: 3.0 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Full model will be compiled to: P:/Models/async0x42_Llama-3.1-70B-ArliAI-RPMax-v1.1-exl2_3.50bpw/
 -- Reusing measurement: Z:/Raw_Models/Llama-3.1-70B-ArliAI-RPMax-v1.1/measurement.json
 -- Optimizing...
 -- Optimizing:    1/ 240
 -- Optimizing:   19/ 240
 -- Optimizing:   37/ 240
 -- Optimizing:   55/ 240
 -- Optimizing:   73/ 240
 -- Optimizing:   80/ 240
 -- Optimizing:   98/ 240
 -- Optimizing:  116/ 240
 -- Optimizing:  134/ 240
 -- Optimizing:  152/ 240
 -- Optimizing:  170/ 240
 -- Optimizing:  188/ 240
 -- Optimizing:  206/ 240
 -- Optimizing:  224/ 240
 -- max(err): 0.045496
 -- error_norm: 1.897152
 -- Quantization strategy:
 --   model.layers.0.self_attn                           4.1747 bpw - exp. error: 0.01745901
 --   model.layers.0.mlp                                 2.2361 bpw - exp. error: 0.02964070
 --   model.layers.1.self_attn                           4.1747 bpw - exp. error: 0.02239446
 --   model.layers.1.mlp                                 2.2361 bpw - exp. error: 0.03393357
 --   model.layers.2.self_attn                           6.2434 bpw - exp. error: 0.00387378
 --   model.layers.2.mlp                                 3.3615 bpw - exp. error: 0.01970356
 --   model.layers.3.self_attn                           4.1747 bpw - exp. error: 0.02133299
 --   model.layers.3.mlp                                 4.2559 bpw - exp. error: 0.00623711
 --   model.layers.4.self_attn                           2.1243 bpw - exp. error: 0.00331855
 --   model.layers.4.mlp                                 2.2361 bpw - exp. error: 0.00501283
 --   model.layers.5.self_attn                           2.1243 bpw - exp. error: 0.00383393
 --   model.layers.5.mlp                                 2.2361 bpw - exp. error: 0.00575347
 --   model.layers.6.self_attn                           2.2254 bpw - exp. error: 0.00298922
 --   model.layers.6.mlp                                 2.2361 bpw - exp. error: 0.00677872
 --   model.layers.7.self_attn                           2.1794 bpw - exp. error: 0.00327658
 --   model.layers.7.mlp                                 2.2361 bpw - exp. error: 0.00757924
 --   model.layers.8.self_attn                           2.2254 bpw - exp. error: 0.00407865
 --   model.layers.8.mlp                                 2.2361 bpw - exp. error: 0.00815845
 --   model.layers.9.self_attn                           2.1243 bpw - exp. error: 0.00681557
 --   model.layers.9.mlp                                 2.2361 bpw - exp. error: 0.00932074
 --   model.layers.10.self_attn                          3.1477 bpw - exp. error: 0.00530455
 --   model.layers.10.mlp                                2.2361 bpw - exp. error: 0.01058968
 --   model.layers.11.self_attn                          2.1243 bpw - exp. error: 0.00753149
 --   model.layers.11.mlp                                2.2361 bpw - exp. error: 0.01250959
 --   model.layers.12.self_attn                          2.6594 bpw - exp. error: 0.00754494
 --   model.layers.12.mlp                                2.2361 bpw - exp. error: 0.01361098
 --   model.layers.13.self_attn                          2.2254 bpw - exp. error: 0.01268869
 --   model.layers.13.mlp                                2.3168 bpw - exp. error: 0.01427771
 --   model.layers.14.self_attn                          3.1477 bpw - exp. error: 0.01084663
 --   model.layers.14.mlp                                2.2361 bpw - exp. error: 0.01732151
 --   model.layers.15.self_attn                          2.1243 bpw - exp. error: 0.00000000
Traceback (most recent call last):
  File "D:\AI\exllamav2\convert.py", line 1, in <module>
    import exllamav2.conversion.convert_exl2
  File "D:\AI\exllamav2\exllamav2\conversion\convert_exl2.py", line 283, in <module>
    optimize(job, save_job, model)
  File "D:\AI\exllamav2\exllamav2\conversion\optimize.py", line 167, in optimize
    logerr += math.log(err)
              ^^^^^^^^^^^^^
ValueError: math domain error