Here the code for GPU but it seems, doesn't run fast

Question

Here the code for GPU but it seems, doesn't run fast

Apisteftos opened this issue a year ago · 0 comments

I inferenced the model with both CPU and GPU, but it seems the only difference between the is just 4 ms. With CPU reaches almost 26ms and with CUDA 21 ms. I don't unterstand why is running so bad. I suppose the export in onnx form was not so successful.

def inference(self, input_tensor):
        
        input_name = self.session.get_inputs()[0].name
        output_name = self.session.get_outputs()[0].name
        
        iobinding = self.session.io_binding()
        ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(input_tensor, 'cuda', 0)
        iobinding.bind_input(input_name, 'cuda', 0, np.float32, ortvalue.shape(), ortvalue.data_ptr())
        iobinding.bind_output(output_name, 'cuda', 0)
        
        
        start = time.perf_counter()
        #outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
        self.session.run_with_iobinding(iobinding)
        print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")
        
        
        outputs = iobinding.copy_outputs_to_cpu()

        
        return outputs