fail to run smoothquant on llama3-8b
gloritygithub11 opened this issue · 3 comments
gloritygithub11 commented
config like following:
base:
seed: &seed 42
model:
type: Llama
path: /models/Meta-Llama-3-8B-Instruct
torch_dtype: auto
calib:
name: pileval
download: False
path: /models/src/llmc/tools/data/calib/pileval
n_samples: 512
bs: 1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: []
name: wikitext2
download: False
path: /models/src/llmc/tools/data/eval/wikitext2
bs: 1
seq_len: 2048
quant:
method: SmoothQuant
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
save:
save_trans: False
# save_lightllm: True
save_path: ./save
and get error:
Traceback (most recent call last):
File "/models/src/llmc/llmc/__main__.py", line 160, in <module>
main(config)
File "/models/src/llmc/llmc/__main__.py", line 60, in main
model.collect_first_block_input(calib_data)
File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/models/src/llmc/llmc/models/base_model.py", line 103, in collect_first_block_input
self.model(data.to(next(self.model.parameters()).device))
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
outputs = self.model(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 940, in forward
causal_mask = self._update_causal_mask(
File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1061, in _update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1309) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
File "/opt/conda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/models/src/llmc/llmc/__main__.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-08-23_08:57:09
host : 20b728e84f30
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1309)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Harahan commented
It doesn't seem like the problem from llmc. You can set the model type in the config to torch.float16
or refer to this issue.
Remark: Remember to add eval_pos
parameters, e.g., pretrain
, transformed
, or fake_quant
, or there's no ppl evaluation.
gloritygithub11 commented
Thanks for the feedback, I will try it later.
The default torch installed in the given container ghcr.io/modeltc/lightllm:main is 2.0.0, is it better add torch minimal version to 2.1.0? or add some remarks for the spectial handling for the model Llama3?
Harahan commented
Thinks for your suggestion. We will talk to lightllm developers later.