fail to run awq on qwen2-7B
Closed this issue · 2 comments
Muuut commented
config like this:
base:
seed: &seed 42
model:
type: Qwen2
path: /home/LLMCompression/model/Qwen2-7B # model path
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /home/LLMCompression/dataset/calib_datasets/pileval # calib data path
n_samples: 128
bs: -1
seq_len: 512
preproc: pileval_awq
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /home/LLMCompression/dataset/eval_datasets/wikitext2 # eval data path
seq_len: 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 1
inference_per_block: False
# Consistency of tokens between original and fake-quantized model output.
eval_token_consist: True
quant:
method: Awq
weight:
bit: 8
symmetric: True
granularity: per_channel
group_size: -1
act:
bit: 8
symmetric: True
granularity: per_token
special:
trans: True
# The options for "trans_version" include "v1" and "v2".
trans_version: v2
weight_clip: True
clip_sym: True
save:
save_trans: False
save_fake: False
save_path: /home/LLMCompression/model/save/Qwen2-7B-AWQ-w8a8
get this error:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/LLMCompression/llmc-main/llmc/__main__.py", line 271, in <module>
[rank0]: main(config)
[rank0]: File "/home/LLMCompression/llmc-main/llmc/__main__.py", line 27, in main
[rank0]: model = MODEL_REGISTRY[config.model.type](
[rank0]: File "/home/LLMCompression/llmc-main/llmc/models/qwen2.py", line 9, in __init__
[rank0]: super().__init__(model_path, torch_dtype, device_map, use_cache)
[rank0]: File "/home/LLMCompression/llmc-main/llmc/models/base_model.py", line 30, in __init__
[rank0]: self.find_embed_layers()
[rank0]: File "/home/LLMCompression/llmc-main/llmc/models/qwen2.py", line 16, in find_embed_layers
[rank0]: self.rotary_emb = self.model.model.rotary_emb
[rank0]: File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'Qwen2Model' object has no attribute 'rotary_emb'
E1025 11:15:26.387000 140316599792832 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2398) of binary: /root/miniconda3/envs/torch231/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/torch231/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/torch231/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/LLMCompression/llmc-main/llmc/__main__.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-10-25_11:15:26
host : 11ea6d23ac9f
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2398)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
And there is #139. I have updated the repository but still got this error.