Comparison with the unquantized model on the OpenVINO backend
Closed this issue · 1 comments
audreyeternal commented
Now I already had the quantized openvino_deploy_model.xml
and I want to get the unquantized version to do the comparison. I removed enable_calibration()
and enable_quantization()
function and got the .xml
file using openvino mo
(model optimizer). However, when I use benchmark_app
to calculate the latency, the inference_precision
seems the same:
- for unquantized version:
(openvino) yu.zhou@blx-jcam1:~/denoise/quantization$ benchmark_app -m /home/ISAS.DE/yu.zhou/denoise/quantization/experiment/denoise5/mmv_im2im_deploy_model.xml -nstream 1 -data_shape [1,1,32,128,128] -api sync
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
CPU
openvino_intel_cpu_plugin version 2022.2
Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 11.73 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input' precision f32, dimensions ([...]): ? 1 32 128 128
[ INFO ] Model output 'output' precision f32, dimensions ([...]): ? 1 32 128 128
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 86.49 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ] AVAILABLE_DEVICES , ['']
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 1, 1)
[ INFO ] RANGE_FOR_STREAMS , (1, 16)
[ INFO ] FULL_DEVICE_NAME , Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
[ INFO ] OPTIMIZATION_CAPABILITIES , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ] CACHE_DIR ,
[ INFO ] NUM_STREAMS , 1
[ INFO ] AFFINITY , Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS , 0
[ INFO ] PERF_COUNT , False
[ INFO ] INFERENCE_PRECISION_HINT , <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT , PerformanceMode.LATENCY
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.06 ms
[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!
[ INFO ] Fill input 'input' with random values
[Step 10/11] Measuring performance (Start inference synchronously, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 64.88 ms
[Step 11/11] Dumping statistics report
Count: 1987 iterations
Duration: 60004.70 ms
Latency:
AVG: 29.90 ms
MIN: 29.75 ms
MAX: 39.06 ms
Throughput: 33.48 FPS
- for quantized version:
(openvino) yu.zhou@blx-jcam1:~/denoise/quantization$ benchmark_app -m /home/ISAS.DE/yu.zhou/denoise/quantization/experiment/denoise4/mmv_im2im_deploy_model.xml -nstream 1 -data_shape [1,1,32,128,128] -api sync
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading OpenVINO
[ WARNING ] PerformanceMode was not explicitly specified in command line. Device CPU performance hint will be set to LATENCY.
[ INFO ] OpenVINO:
API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
CPU
openvino_intel_cpu_plugin version 2022.2
Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2
[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 23.47 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'input' precision f32, dimensions ([...]): ? 1 32 128 128
[ INFO ] Model output 'output' precision f32, dimensions ([...]): ? 1 32 128 128
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 73.35 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ] AVAILABLE_DEVICES , ['']
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS , (1, 1, 1)
[ INFO ] RANGE_FOR_STREAMS , (1, 16)
[ INFO ] FULL_DEVICE_NAME , Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
[ INFO ] OPTIMIZATION_CAPABILITIES , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ] CACHE_DIR ,
[ INFO ] NUM_STREAMS , 1
[ INFO ] AFFINITY , Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS , 0
[ INFO ] PERF_COUNT , False
[ INFO ] INFERENCE_PRECISION_HINT , <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT , PerformanceMode.LATENCY
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.07 ms
[ WARNING ] No input files were given for input 'input'!. This input will be filled with random values!
[ INFO ] Fill input 'input' with random values
[Step 10/11] Measuring performance (Start inference synchronously, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 50.39 ms
[Step 11/11] Dumping statistics report
Count: 3750 iterations
Duration: 60006.52 ms
Latency:
AVG: 15.70 ms
MIN: 15.61 ms
MAX: 23.41 ms
Throughput: 63.81 FPS
Both the INFERENCE_PRECISION_HINT
are fp32
. Meanwhile, the metrics after quantization is even higher before the quantization. So I am not sure if I did the right thing to obtain the unquantized model? Thank you!
github-actions commented
This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!