ModelTC/MQBench

Comparison with the unquantized model on the OpenVINO backend

Closed this issue · 1 comments

Now I already had the quantized openvino_deploy_model.xml and I want to get the unquantized version to do the comparison. I removed enable_calibration() and enable_quantization() function and got the .xml file using openvino mo (model optimizer). However, when I use benchmark_app to calculate the latency, the inference_precision seems the same:

  • for unquantized version:
(openvino) yu.zhou@blx-jcam1:~/denoise/quantization$ benchmark_app -m /home/ISAS.DE/yu.zhou/denoise/quantization/experiment/denoise5/mmv_im2im_deploy_model.xml -nstream 1 -data_shape [1,1,32,128,128] -api sync
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 11.73 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input' precision f32, dimensions ([...]): ? 1 32 128 128
[ INFO ] Model output 'output' precision f32, dimensions ([...]): ? 1 32 128 128
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 86.49 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 16)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.06 ms
[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!
[ INFO ] Fill input 'input' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 64.88 ms
[Step 11/11] Dumping statistics report
Count:          1987 iterations
Duration:       60004.70 ms
Latency:
    AVG:        29.90 ms
    MIN:        29.75 ms
    MAX:        39.06 ms
Throughput: 33.48 FPS
  • for quantized version:
(openvino) yu.zhou@blx-jcam1:~/denoise/quantization$ benchmark_app -m /home/ISAS.DE/yu.zhou/denoise/quantization/experiment/denoise4/mmv_im2im_deploy_model.xml -nstream 1 -data_shape [1,1,32,128,128] -api sync
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
         API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 23.47 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input' precision f32, dimensions ([...]): ? 1 32 128 128
[ INFO ] Model output 'output' precision f32, dimensions ([...]): ? 1 32 128 128
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 73.35 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 16)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  , 
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.07 ms
[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!
[ INFO ] Fill input 'input' with random values 
[Step 10/11] Measuring performance (Start inference synchronously, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 50.39 ms
[Step 11/11] Dumping statistics report
Count:          3750 iterations
Duration:       60006.52 ms
Latency:
    AVG:        15.70 ms
    MIN:        15.61 ms
    MAX:        23.41 ms
Throughput: 63.81 FPS

Both the INFERENCE_PRECISION_HINT are fp32. Meanwhile, the metrics after quantization is even higher before the quantization. So I am not sure if I did the right thing to obtain the unquantized model? Thank you!

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!