ModelTC/llmc

使用wanda对Llama-2-7b-hf模型进行压缩,为什么模型大小没有变化?

ChengShuting opened this issue · 1 comments

config:
base:
seed: &seed 42
model:
type: Llama
path: /data/Llama-2-7b-hf
torch_dtype: auto
calib:
name: pileval
download: True
path: calib data path
n_samples: 128
bs: -1
seq_len: 512
preproc: general
seed: *seed
eval:
eval_pos: [transformed]
name: [wikitext2]
download: True
path: eval data path
bs: 1
seq_len: 2048
sparse:
method: Wanda
weight:
sparsity: 0.5
sparsity_out: True
save:
save_fp: False
save_trans: True
save_path: ./save3
原模型:
image
压缩后:
image

Wanda is an unstructured pruning strategy, which currently brings only theoretical storage savings. You can try the quantization algorithms with save_lightllm and utilize our Lightllm for inference.