llama3.1-70b awq_w4a4 error
lihongqiang opened this issue · 2 comments
I got a error when I run the run_awq_llm.sh. I want to ask the question that the llmc can suport the llama3.1-70b? or my config is error. please help me to solve the problem.
my yml file:
base:
seed: &seed 42
model:
type: Llama
path: /data/root/jupyter/modelscope/fintune/autodl-tmp/LLM-Research/Meta-Llama-3.1-8B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: False
path: /data/root/jupyter/modelscope/fintune/llmc/data/pileval
n_samples: 128
bs: -1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: /data/root/jupyter/modelscope/fintune/llmc/data/wikitext2
bs: 20
inference_per_block: True
# For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.
# For 7B / 13B model eval, bs can be set to 1, and inference_per_block can be set to False.
seq_len: 2048
quant:
method: Awq
weight:
bit: 4
symmetric: False
granularity: per_channel
group_size: -1
calib_algo: learnable
act:
bit: 4
symmetric: False
granularity: per_token
calib_algo: minmax
special:
trans: True
trans_version: v2
weight_clip: True
clip_version: v2
save_scale: True
scale_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4_scale
save_clip: True
clip_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4_clip
save:
save_trans: True
save_quant: False
save_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4
We will fix this later.
This seems to be a bug in the transformers. This can be solved by adding this sentence "inv_freq_expanded=inv_freq_expanded.to(x.device)" before line 153 in the modeling_llama.py file of the transformers you installed.
By the way, the execution of llama3.1 necessitates the use of a newer version of transformers, which may unfortunately result in encountering this particular bug. If your requirement is to run llama1, llama2, or llama3, simply downgrading the transformers version should suffice to avert this issue.