Here the code for GPU but it seems, doesn't run fast
Apisteftos opened this issue · 0 comments
Apisteftos commented
I inferenced the model with both CPU and GPU, but it seems the only difference between the is just 4 ms. With CPU reaches almost 26ms and with CUDA 21 ms. I don't unterstand why is running so bad. I suppose the export in onnx form was not so successful.
def inference(self, input_tensor):
input_name = self.session.get_inputs()[0].name
output_name = self.session.get_outputs()[0].name
iobinding = self.session.io_binding()
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(input_tensor, 'cuda', 0)
iobinding.bind_input(input_name, 'cuda', 0, np.float32, ortvalue.shape(), ortvalue.data_ptr())
iobinding.bind_output(output_name, 'cuda', 0)
start = time.perf_counter()
#outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
self.session.run_with_iobinding(iobinding)
print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")
outputs = iobinding.copy_outputs_to_cpu()
return outputs