ModelTC/llmc

fail to run smoothquant on llama3-8b

gloritygithub11 opened this issue · 3 comments

config like following:

base:
    seed: &seed 42
model:
    type: Llama
    path: /models/Meta-Llama-3-8B-Instruct
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /models/src/llmc/tools/data/calib/pileval
    n_samples: 512
    bs: 1
    seq_len: 512
    preproc: general
    seed: *seed
eval:
    eval_pos: []
    name: wikitext2
    download: False
    path: /models/src/llmc/tools/data/eval/wikitext2
    bs: 1
    seq_len: 2048
quant:
    method: SmoothQuant
    weight:
        bit: 8
        symmetric: True
        granularity: per_channel
    act:
        bit: 8
        symmetric: True
        granularity: per_token
save:
    save_trans: False
    # save_lightllm: True
    save_path: ./save

and get error:

Traceback (most recent call last):
  File "/models/src/llmc/llmc/__main__.py", line 160, in <module>
    main(config)
  File "/models/src/llmc/llmc/__main__.py", line 60, in main
    model.collect_first_block_input(calib_data)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/models/src/llmc/llmc/models/base_model.py", line 103, in collect_first_block_input
    self.model(data.to(next(self.model.parameters()).device))
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 940, in forward
    causal_mask = self._update_causal_mask(
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1061, in _update_causal_mask
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1309) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/models/src/llmc/llmc/__main__.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-23_08:57:09
  host      : 20b728e84f30
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1309)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

It doesn't seem like the problem from llmc. You can set the model type in the config to torch.float16 or refer to this issue.

Remark: Remember to add eval_pos parameters, e.g., pretrain, transformed, or fake_quant, or there's no ppl evaluation.

Thanks for the feedback, I will try it later.

The default torch installed in the given container ghcr.io/modeltc/lightllm:main is 2.0.0, is it better add torch minimal version to 2.1.0? or add some remarks for the spectial handling for the model Llama3?

Thinks for your suggestion. We will talk to lightllm developers later.