Why doesn't the size and precision of the model change after INT4 quantization?
Closed this issue ยท 2 comments
๐ Describe the bug
FP32: pin.bin size:25455KB
FP32:pin.xml size:597KB
int4_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group_size=128, ratio=0.8)
serialize(int4_model, r'C:\Users\JX1402006\Desktop\XML\int8\pin.xml')
INT4: pin.bin size:25455KB
INT4:pin.xml size:597KB
Environment
python 3.9
openvino 2023.3.0
nncf 2.8.1
Minimal Reproducible Example
from openvino.runtime import Core, serialize, Model
from nncf import compress_weights, CompressWeightsMode
ie = Core()
model = ie.read_model(r'C:\Users\JX1402006\Desktop\XML\pin.xml')
int4_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group_size=128, ratio=0.8)
serialize(int4_model, r'C:\Users\JX1402006\Desktop\XML\int8\pin.xml')
Are you going to submit a PR?
- Yes I'd like to help by submitting a PR!
compress_weights
applies weight compression to the model in-place. I mean the input model will be modified as well.
Could you check that you serialize the input model before calling compress_weights
for calculation the model size?
no activity