High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.
hyx1999 opened this issue · 1 comments
I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake
quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.
wbits=4, groupsize=128, without opt_pack
wikitext2
Evaluating ...
0
1
2
3
4
5
6
7
8
9
10
11
28.715469360351562
wbits=4, groupsize=128, with opt_pack
wikitext2
Evaluating ...
0
1
2
3
4
5
6
7
8
9
10
11
778.898193359375
# opt_pack before opt_eval
if not args.load and args.wbits < 16 and not args.nearest:
model = opt_pack(model, quantizers, args.wbits, args.groupsize)
print("model:", "\n", model)
if args.eval:
datasets = ['wikitext2']
if args.new_eval:
datasets = ['wikitext2']
for dataset in datasets:
dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
print(dataset)
opt_eval(model, testloader, DEV)
I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.
wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562
wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375
# opt_pack before opt_eval if not args.load and args.wbits < 16 and not args.nearest: model = opt_pack(model, quantizers, args.wbits, args.groupsize) print("model:", "\n", model) if args.eval: datasets = ['wikitext2'] if args.new_eval: datasets = ['wikitext2'] for dataset in datasets: dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir) print(dataset) opt_eval(model, testloader, DEV)
I completed the above test using Facebook/opt 125m