OpenGVLab/EfficientQAT

Can not reproduce the results

LiuSiQi-TJ opened this issue · 2 comments

Hello, Thanks for releasing the code! I am very interested in your wonderful work!
But I can't reproduce the results in Block-AP stage. I try to quant Llama-2-7b use config in block-ap/Llama-2-7b/w2g64.sh. I use the default parameters except real_quant=False.

Here is my log:

[2024-08-01 11:39:33 root] (main_block_ap.py 118): INFO Namespace(model='pretrain_models/llama-7B', cache_dir='./cache', output_dir='./output/block_ap_log/Llama-2-7b-w2g64', save_quant_dir='./output/block_ap_models/Llama-2-7b-w2g64', real_quant=False, resume_quant=None, calib_dataset='redpajama', train_size=4096, val_size=64, training_seqlen=2048, batch_size=2, epochs=2, num_workers=2, prefetch_factor=None, ppl_seqlen=2048, seed=2, eval_ppl=True, eval_tasks='piqa,arc_easy,arc_challenge,hellaswag,winogrande', eval_batch_size=16, wbits=2, group_size=64, quant_lr=0.0001, weight_lr=2e-05, min_lr_factor=20, clip_grad=0.3, wd=0, net='Llama-2', max_memory='70GiB', early_stop=0, off_load_to_disk=False)
[2024-08-01 11:39:33 root] (main_block_ap.py 140): INFO === start quantization ===
[2024-08-01 11:39:34 root] (main_block_ap.py 147): INFO load trainloader from ./cache/dataloader_Llama-2_redpajama_4096_64_2048_train.cache
[2024-08-01 11:39:34 root] (main_block_ap.py 149): INFO load valloader from ./cache/dataloader_Llama-2_redpajama_4096_64_2048_val.cache
[2024-08-01 11:39:34 root] (block_ap.py 41): INFO Starting ...
[2024-08-01 11:40:55 root] (block_ap.py 166): INFO === Start quantize blocks 0===
[2024-08-01 11:47:18 root] (block_ap.py 274): INFO blocks 0 epoch 0 recon_loss:0.004269289318472147 val_loss:0.004439935088157654 quant_lr:5.246359588146619e-05 norm:0.00073618 max memory_allocated 8387.3271484375 time 291.4592695236206
[2024-08-01 11:52:08 root] (block_ap.py 274): INFO blocks 0 epoch 1 recon_loss:0.0053044268861413 val_loss:0.005488379392772913 quant_lr:5e-06 norm:0.00143698 max memory_allocated 8387.330078125 time 290.0065920352936
[2024-08-01 11:53:43 root] (block_ap.py 166): INFO === Start quantize blocks 1===
......
[2024-08-01 17:53:41 root] (block_ap.py 166): INFO === Start quantize blocks 29===
[2024-08-01 17:59:54 root] (block_ap.py 274): INFO blocks 29 epoch 0 recon_loss:4.219094753265381 val_loss:4.433276653289795 quant_lr:5.246359588146619e-05 norm:0.26454368 max memory_allocated 8391.3212890625 time 288.33731985092163
[2024-08-01 18:04:48 root] (block_ap.py 274): INFO blocks 29 epoch 1 recon_loss:4.196527004241943 val_loss:4.423843860626221 quant_lr:5e-06 norm:0.24296847 max memory_allocated 8391.3212890625 time 293.8242509365082
[2024-08-01 18:06:17 root] (block_ap.py 166): INFO === Start quantize blocks 30===
[2024-08-01 18:12:31 root] (block_ap.py 274): INFO blocks 30 epoch 0 recon_loss:5.096967697143555 val_loss:5.343477249145508 quant_lr:5.246359588146619e-05 norm:0.49191153 max memory_allocated 8391.3212890625 time 287.03909397125244
[2024-08-01 18:17:24 root] (block_ap.py 274): INFO blocks 30 epoch 1 recon_loss:5.063900947570801 val_loss:5.328426361083984 quant_lr:5e-06 norm:0.44734499 max memory_allocated 8391.3212890625 time 293.3575370311737
[2024-08-01 18:18:59 root] (block_ap.py 166): INFO === Start quantize blocks 31===
[2024-08-01 18:25:14 root] (block_ap.py 274): INFO blocks 31 epoch 0 recon_loss:8.381441116333008 val_loss:8.722651481628418 quant_lr:5.246359588146619e-05 norm:2.53573585 max memory_allocated 8391.3212890625 time 288.2078945636749
[2024-08-01 18:30:10 root] (block_ap.py 274): INFO blocks 31 epoch 1 recon_loss:8.29963493347168 val_loss:8.673598289489746 quant_lr:5e-06 norm:2.05528784 max memory_allocated 8391.3212890625 time 295.6733980178833
[2024-08-01 18:31:44 root] (main_block_ap.py 168): INFO 24730.73664689064
[2024-08-01 18:31:44 root] (main_block_ap.py 172): INFO start saving model
[2024-08-01 18:32:13 root] (main_block_ap.py 175): INFO save model success
[2024-08-01 18:35:01 root] (main_block_ap.py 39): INFO wikitext2 perplexity: 121.43
[2024-08-01 18:35:01 root] (main_block_ap.py 39): INFO c4 perplexity: 72.45

I find that in the end, loss increase to 8.29 and the ppl in C4 and wikitext2 seems wrong.
Can you tell me if there was anything wrong with my config, or can you provide me with your log of Llama-2-7b w2g64 in Block-AP stage?
Thank you very much !

I just tried it, and the results are correct in my experiments.

I find that in your command, the model you used is 'model='pretrain_models/llama-7B'', is this llama-2-7B or llama-1-7B.

And the net you set is 'llama-2'. Note that model should be consist with args.net to ensure correct loading of cached datasets.

Therefore, I doubt that you used wrong model. You can delete the ./cache folder and also set the correct args.model and args.net.

Thank you for your replay!
I find there is a bug in my code and I use the wrong model.
Now I can reproduce the results of Block-Ap (avg ppl is 8.62), Thank you very much