Is dynamic batch supported?
Closed this issue · 2 comments
audreyeternal commented
Hi, thank you for developing this very powerful quantization toolkit!
I have a question about converting the quantized onnx file to TensorRT engine. Does mqbench support dynamic batch?
Because in my project, the batch size is not fixed in the inference stage. However, I didn't find anything about dynamic batch in the toolkit. Seems like we need to manually set the --batch-size
flag in the onnx2trt.py
file.
Thank you!
audreyeternal commented
I have solved the problem and I will share my solution about how to enable dynamic batch in MQBench.
- First,
batch-size
flag in theonnx2trt.py
file has nothing to do with trt engine generation. - When we use
convert_deploy()
function to convert the quantized module to onnx file, we should add**extra_kwargs
:
# axis 0(batch channel) as dynamic:
dynamic_axes = {'input_1': {0: 'batch_size'}, 'output_1': {0: 'batch_size'}, 'output_2': {0: 'batch_size'}}
extra_kwargs = dict(input_names=["input_1"], output_names=["output_1","output_2"], dynamic_axes = dynamic_axes)
convert_deploy(...,**extra_kwargs)
- Add the optimization profile for the input binding in the
onnx2trt.py
file:
profile = builder.create_optimization_profile()
profile.set_shape("input_1", (1,2,224,224), (4,2,224,224), (8,2,224,224)) # min_batch, opt_batch, max_batch
config.add_optimization_profile(profile)
engine = builder.build_engine(network, config)
- When doing the inference, keep in mind to set the active profile:
self.context = self.engine.create_execution_context()
self.context.active_optimization_profile = 0
github-actions commented
This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!