Reproduce the PPL accuracy anomaly of GPT2 W8A8 (PPL=17590.9778)

Question

Reproduce the PPL accuracy anomaly of GPT2 W8A8 (PPL=17590.9778)

Opened this issue 9 months ago · 0 comments

Use the gpt2 model, and test the quantification accuracy.
model download: https://github.com/quic/aimet-model-zoo/releases/download/torch_gpt2/gpt2_wikitext_finetune.tar.gz
test data：wikitext-2-raw-v1,

Item	Description
AIMET	1.28.0
Linux kernel	20.04
cuda	11.6
torch	torch1.13.1-cu116
python	3.8.10
aimet-zoo-torch	1.5.0

The accuracy of fp32 is correct, but the accuracy of W8A8 is particularly large.
The results are as follows:
aimet_zoo_torch/gpt2/evaluators# python gpt2_quanteval.py --model_config gpt2_w8a8 --per_device_eval_batch_size 8
2023-10-19 02:52:23,612 - root - INFO - AIMET
2023-10-19 02:52:39,262 - datasets.builder - WARNING - Reusing dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 712.27it/s]
2023-10-19 02:52:39,374 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-957e58d88e4ab49c.arrow
2023-10-19 02:52:39,407 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-10932a0976197214.arrow
2023-10-19 02:52:39,440 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-ba370f2b62ba6d71.arrow
2023-10-19 02:52:39,452 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-1252412874756be5.arrow
2023-10-19 02:52:39,464 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-cfab500129fdf76e.arrow
2023-10-19 02:52:39,476 - datasets.arrow_dataset - WARNING - Loading cached processed dataset at /root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/cache-af35feebb8a10af8.arrow
orig model fp32 inference
loss: 3.320616739840547 , ppl: 27.67741506034785
/usr/local/lib/python3.8/dist-packages/aimet_zoo_torch/gpt2/model/huggingface/baseline_models/gpt2/modeling_gpt2.py:188: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
w = w / (float(v.size(-1)) ** 0.5)
2023-10-19 02:52:51,367 - Quant - INFO - Unsupported op type Squeeze
2023-10-19 02:52:51,368 - Quant - INFO - Unsupported op type Mean
2023-10-19 02:52:51,542 - Quant - INFO - Selecting DefaultOpInstanceConfigGenerator to compute the specialized config. hw_version:default
loss: 3.1809085607528687 , ppl: 24.06861141667116
sim_orig model int8 inference
loss: 9.775141424384 , ppl: 17590.977796391602
2023-10-19 02:53:10,600 - main - INFO - Original model performances
2023-10-19 02:53:10,601 - main - INFO - ===========================
2023-10-19 02:53:10,601 - main - INFO - Original Model | 32-bit Environment | perplexity : 27.6774
2023-10-19 02:53:10,601 - main - INFO - Original Model | 8-bit Environment | perplexity: 17590.9778

Is there any issues about my usage?