ModelTC/llmc

fail to run awq on Qwen2-7B-Instruct

gloritygithub11 opened this issue · 3 comments

config file like following:

base:
    seed: &seed 42
model:
    type: Qwen2
    path: /models/Qwen2-7B-Instruct
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /models/src/llmc/tools/data/calib/pileval
    n_samples: 128
    bs: -1
    seq_len: 512
    preproc: pileval_awq
    seed: *seed
eval:
    # eval_pos: []
    eval_pos: [pretrain, transformed]
    name: wikitext2
    download: False
    path: /models/src/llmc/tools/data/eval/wikitext2
    bs: 1
    seq_len: 2048
quant:
    method: Awq
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 128
save:
    save_trans: False
    save_lightllm: True
    save_path: ./save

get error:

2024-08-23 10:39:41.133 | INFO     | __main__:main:78 - wikitext2 ppl : 8.77077579498291
2024-08-23 10:39:41.133 | INFO     | llmc.compression.quantization.base_blockwise_quantization:deploy:778 - -- deploy_real_quant_model start --
2024-08-23 10:39:41.133 | INFO     | llmc.compression.quantization.base_blockwise_quantization:deploy:779 - quant_config : {'method': 'Awq', 'weight': {'bit': 4, 'symmetric': False, 'granularity': 'per_group', 'group_size': 128}}
2024-08-23 10:39:41.134 | INFO     | llmc.models.base_model:replace_module_all:191 - Replace block index: 0/28
Traceback (most recent call last):
  File "/models/src/llmc/llmc/__main__.py", line 160, in <module>
    main(config)
  File "/models/src/llmc/llmc/__main__.py", line 107, in main
    blockwise_opt.deploy('real_quant')
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/models/src/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 793, in deploy
    self.model.replace_module_all(
  File "/models/src/llmc/llmc/models/base_model.py", line 194, in replace_module_all
    self.replace_module_block(module, block, block_idx, params_dict)
  File "/models/src/llmc/llmc/models/base_model.py", line 210, in replace_module_block
    self.replace_module_subset(module, block, subset, block_idx, params_dict)
  File "/models/src/llmc/llmc/models/base_model.py", line 225, in replace_module_subset
    M = module.new(m, **params_tmp_dict)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/models/src/llmc/llmc/compression/quantization/module_utils.py", line 544, in new
    weight, scales, zeros = cls.quant_pack(module, w_q, quant_config)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/models/src/llmc/llmc/compression/quantization/module_utils.py", line 570, in quant_pack
    weight, scales, zeros = w_q(module)
  File "/models/src/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 41, in w_q
    return wquantizer.real_quant_weight_dynamic(module.weight.data)
  File "/models/src/llmc/llmc/compression/quantization/quant.py", line 457, in real_quant_weight_dynamic
    if zeros != torch.tensor(0.0) and self.round_zp:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1904) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/models/src/llmc/llmc/__main__.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-23_10:39:45
  host      : 20b728e84f30
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1904)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

If you are in a hurry, I think you can try to fix the error by checking all the elements in zero tensor at the place throwing the error. We will fix it later.

I just evaluate the llmc, to check whether it could run the kinds of quantize methods, espetially for the qwen2-72b models. Unfortunatly, it is not easy to run them through. Hope it could be fixed quickly. Thank you very much.

I just evaluate the llmc, to check whether it could run the kinds of quantize methods, espetially for the qwen2-72b models. Unfortunatly, it is not easy to run them through. Hope it could be fixed quickly. Thank you very much.

#48