Why doesn't the size and precision of the model change after INT4 quantization?

Question

Why doesn't the size and precision of the model change after INT4 quantization?

Closed this issue 8 months ago · 2 comments

xingfenghaizeiwang commented 9 months ago

🐛 Describe the bug

FP32: pin.bin size:25455KB
FP32:pin.xml size:597KB

int4_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group_size=128, ratio=0.8)
serialize(int4_model, r'C:\Users\JX1402006\Desktop\XML\int8\pin.xml')

INT4: pin.bin size:25455KB
INT4:pin.xml size:597KB

Environment

python 3.9
openvino 2023.3.0
nncf 2.8.1

Minimal Reproducible Example

from openvino.runtime import Core, serialize, Model
from nncf import compress_weights, CompressWeightsMode

ie = Core()

model = ie.read_model(r'C:\Users\JX1402006\Desktop\XML\pin.xml')

int4_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group_size=128, ratio=0.8)

serialize(int4_model, r'C:\Users\JX1402006\Desktop\XML\int8\pin.xml')

Are you going to submit a PR?

Yes I'd like to help by submitting a PR!

Answer 1 · 2024-03-08T07:32:41.000Z

compress_weights applies weight compression to the model in-place. I mean the input model will be modified as well.

Could you check that you serialize the input model before calling compress_weights for calculation the model size?

Answer 2 · 2024-03-26T12:34:43.000Z

no activity