ModelTC/MQBench

Is dynamic batch supported?

Closed this issue · 2 comments

Hi, thank you for developing this very powerful quantization toolkit!
I have a question about converting the quantized onnx file to TensorRT engine. Does mqbench support dynamic batch?
Because in my project, the batch size is not fixed in the inference stage. However, I didn't find anything about dynamic batch in the toolkit. Seems like we need to manually set the --batch-size flag in the onnx2trt.py file.
Thank you!

I have solved the problem and I will share my solution about how to enable dynamic batch in MQBench.

  • First, batch-size flag in the onnx2trt.py file has nothing to do with trt engine generation.
  • When we use convert_deploy() function to convert the quantized module to onnx file, we should add **extra_kwargs:
# axis 0(batch channel) as dynamic:
dynamic_axes = {'input_1': {0: 'batch_size'}, 'output_1': {0: 'batch_size'}, 'output_2': {0: 'batch_size'}} 
extra_kwargs = dict(input_names=["input_1"], output_names=["output_1","output_2"], dynamic_axes = dynamic_axes)
convert_deploy(...,**extra_kwargs)
  • Add the optimization profile for the input binding in the onnx2trt.py file:
profile = builder.create_optimization_profile()
profile.set_shape("input_1", (1,2,224,224), (4,2,224,224), (8,2,224,224)) # min_batch, opt_batch, max_batch
config.add_optimization_profile(profile)
engine = builder.build_engine(network, config)
  • When doing the inference, keep in mind to set the active profile:
self.context = self.engine.create_execution_context()
self.context.active_optimization_profile = 0

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!