is that possible to use onnx?
MinGiSa opened this issue · 4 comments
im willing to use this model by onnx.
can it be?
I thank for interesting in TEED.
honestly, I do not know, but it is a simple network maybe you can. Please let me know the advance.
Xavier
Successfully exported to onnx and can be inference
model = TED().to(device)
model.load_state_dict(torch.load('checkpoints/BIPED/7/7_model.pth',
map_location=device))
img_height = 352
img_width = 352
batch_size = 8
dummy_input = torch.rand(batch_size, 3, img_height, img_width)
params = {
0 : 'batch_size',
2 : 'image_height',
3 : 'image_width',
}
dynamic_axes_dict = {
'input': params,
'out_1': params,
'out_2': params,
'out_3': params,
'block_cat': params,
}
onnx_program = torch.onnx.export(model,
dummy_input,
"teed.onnx",
dynamic_axes=dynamic_axes_dict,
output_names=['out_1', 'out_2', 'out_3', 'block_cat'],
input_names=['input'])
def sigmoid(z):
return 1/(1 + np.exp(-z))
ort_session = ort.InferenceSession("teed.onnx")
image_path = 'test.jpg'
device = "cuda" if torch.cuda.is_available() else "cpu"
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
i_h, i_w, _ = image.shape
image = cv2.resize(image, None, fx=0.3, fy= 0.3, interpolation=cv2.INTER_LINEAR)
img_width = ((image.shape[1] // 16) + 1) * 16
img_height = ((image.shape[0] // 16) + 1) * 16
image = cv2.resize(image, (img_width, img_height))
image = np.array(image, dtype=np.float32)
image = image.transpose((2, 0, 1))
image = image[np.newaxis, :]
start_time = time.time()
outputs = ort_session.run(
None,
{"input": image},
)
outputs = np.array(outputs)
print("--- %s seconds ---" % (time.time() - start_time))
edge_maps = []
for i in outputs:
tmp = sigmoid(i)
edge_maps.append(tmp)
tensor = np.array(edge_maps)
output = tensor[:, 0, ...]
output = np.squeeze(output)
fuse = output[-1]
fuse = np.uint8(image_normalization(fuse))
fuse = cv2.bitwise_not(fuse)
fuse = cv2.resize(fuse, (i_w, i_h), interpolation=cv2.INTER_LINEAR)
cv2.imwrite('out.png', fuse)
Successfully exported to onnx and can be inference
model = TED().to(device) model.load_state_dict(torch.load('checkpoints/BIPED/7/7_model.pth', map_location=device)) img_height = 352 img_width = 352 batch_size = 8 dummy_input = torch.rand(batch_size, 3, img_height, img_width) params = { 0 : 'batch_size', 2 : 'image_height', 3 : 'image_width', } dynamic_axes_dict = { 'input': params, 'out_1': params, 'out_2': params, 'out_3': params, 'block_cat': params, } onnx_program = torch.onnx.export(model, dummy_input, "teed.onnx", dynamic_axes=dynamic_axes_dict, output_names=['out_1', 'out_2', 'out_3', 'block_cat'], input_names=['input'])def sigmoid(z): return 1/(1 + np.exp(-z)) ort_session = ort.InferenceSession("teed.onnx") image_path = 'test.jpg' device = "cuda" if torch.cuda.is_available() else "cpu" image = cv2.imread(image_path, cv2.IMREAD_COLOR) i_h, i_w, _ = image.shape image = cv2.resize(image, None, fx=0.3, fy= 0.3, interpolation=cv2.INTER_LINEAR) img_width = ((image.shape[1] // 16) + 1) * 16 img_height = ((image.shape[0] // 16) + 1) * 16 image = cv2.resize(image, (img_width, img_height)) image = np.array(image, dtype=np.float32) image = image.transpose((2, 0, 1)) image = image[np.newaxis, :] start_time = time.time() outputs = ort_session.run( None, {"input": image}, ) outputs = np.array(outputs) print("--- %s seconds ---" % (time.time() - start_time)) edge_maps = [] for i in outputs: tmp = sigmoid(i) edge_maps.append(tmp) tensor = np.array(edge_maps) output = tensor[:, 0, ...] output = np.squeeze(output) fuse = output[-1] fuse = np.uint8(image_normalization(fuse)) fuse = cv2.bitwise_not(fuse) fuse = cv2.resize(fuse, (i_w, i_h), interpolation=cv2.INTER_LINEAR) cv2.imwrite('out.png', fuse)
thank you for share it!
Thanks @xavysp for this great model and thanks @ewfian for sharing your conversion script! I created a colab to do the conversion to ONNX together with some pre/post-processing added in, as well as float16 conversion. BTW @ewfian I am not sure why you used 16 for the padding and not 4.
The colab also includes demo inference code and benchmarks for onnxruntime for python/cuda, js/wasm and js/webgpu. In these synthetic benchmarks I get for 1920x1080 input and the float16 model: 25fps on t4 for python/cuda and 10fps on 3070ti laptop for WebGPU. I am still working on improving the latter as I am using this in my live youtube video effects demonstrator: https://github.com/eyaler/LordTubeMaster