fail to run smoothquant on llama3-8b

Question

fail to run smoothquant on llama3-8b

gloritygithub11 opened this issue 4 months ago · 3 comments

config like following:

base:
    seed: &seed 42
model:
    type: Llama
    path: /models/Meta-Llama-3-8B-Instruct
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /models/src/llmc/tools/data/calib/pileval
    n_samples: 512
    bs: 1
    seq_len: 512
    preproc: general
    seed: *seed
eval:
    eval_pos: []
    name: wikitext2
    download: False
    path: /models/src/llmc/tools/data/eval/wikitext2
    bs: 1
    seq_len: 2048
quant:
    method: SmoothQuant
    weight:
        bit: 8
        symmetric: True
        granularity: per_channel
    act:
        bit: 8
        symmetric: True
        granularity: per_token
save:
    save_trans: False
    # save_lightllm: True
    save_path: ./save

and get error:

Traceback (most recent call last):
  File "/models/src/llmc/llmc/__main__.py", line 160, in <module>
    main(config)
  File "/models/src/llmc/llmc/__main__.py", line 60, in main
    model.collect_first_block_input(calib_data)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/models/src/llmc/llmc/models/base_model.py", line 103, in collect_first_block_input
    self.model(data.to(next(self.model.parameters()).device))
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 940, in forward
    causal_mask = self._update_causal_mask(
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1061, in _update_causal_mask
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1309) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/models/src/llmc/llmc/__main__.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-23_08:57:09
  host      : 20b728e84f30
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1309)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Answer 1 · 2024-08-23T09:29:29.000Z

It doesn't seem like the problem from llmc. You can set the model type in the config to torch.float16 or refer to this issue.

Remark: Remember to add eval_pos parameters, e.g., pretrain, transformed, or fake_quant, or there's no ppl evaluation.

Answer 2 · 2024-08-23T10:54:33.000Z

Thanks for the feedback, I will try it later.

The default torch installed in the given container ghcr.io/modeltc/lightllm:main is 2.0.0, is it better add torch minimal version to 2.1.0? or add some remarks for the spectial handling for the model Llama3?

Answer 3 · 2024-08-23T11:03:37.000Z

Thinks for your suggestion. We will talk to lightllm developers later.