xavysp/TEED

is that possible to use onnx?

MinGiSa opened this issue · 4 comments

im willing to use this model by onnx.
can it be?

xavysp commented

I thank for interesting in TEED.
honestly, I do not know, but it is a simple network maybe you can. Please let me know the advance.

Xavier

ewfian commented

Successfully exported to onnx and can be inference

model = TED().to(device)
model.load_state_dict(torch.load('checkpoints/BIPED/7/7_model.pth',
                                    map_location=device))

img_height = 352
img_width = 352
batch_size = 8
dummy_input = torch.rand(batch_size, 3, img_height, img_width)
params = {
    0 : 'batch_size',
    2 : 'image_height',
    3 : 'image_width',
}
dynamic_axes_dict = {
    'input': params,
    'out_1': params,
    'out_2': params,
    'out_3': params,
    'block_cat': params,
}
onnx_program = torch.onnx.export(model,
                                dummy_input,
                                "teed.onnx",
                                dynamic_axes=dynamic_axes_dict,
                                output_names=['out_1', 'out_2', 'out_3', 'block_cat'],
                                input_names=['input'])
def sigmoid(z):
    return 1/(1 + np.exp(-z))

ort_session = ort.InferenceSession("teed.onnx")


image_path = 'test.jpg'
device = "cuda" if torch.cuda.is_available() else "cpu"
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
i_h, i_w, _ = image.shape
image = cv2.resize(image, None, fx=0.3, fy= 0.3, interpolation=cv2.INTER_LINEAR)
img_width = ((image.shape[1] // 16) + 1) * 16
img_height = ((image.shape[0] // 16) + 1) * 16
image = cv2.resize(image, (img_width, img_height))
image = np.array(image, dtype=np.float32)
image = image.transpose((2, 0, 1))
image = image[np.newaxis, :]


start_time = time.time()

outputs = ort_session.run(
    None,
    {"input": image},
)
outputs = np.array(outputs)

print("--- %s seconds ---" % (time.time() - start_time))

edge_maps = []
for i in outputs:
    tmp = sigmoid(i)
    edge_maps.append(tmp)

tensor = np.array(edge_maps)
output = tensor[:, 0, ...]
output = np.squeeze(output)
fuse = output[-1]
fuse = np.uint8(image_normalization(fuse))
fuse = cv2.bitwise_not(fuse)
fuse = cv2.resize(fuse, (i_w, i_h), interpolation=cv2.INTER_LINEAR)

cv2.imwrite('out.png', fuse)

Successfully exported to onnx and can be inference

model = TED().to(device)
model.load_state_dict(torch.load('checkpoints/BIPED/7/7_model.pth',
                                    map_location=device))

img_height = 352
img_width = 352
batch_size = 8
dummy_input = torch.rand(batch_size, 3, img_height, img_width)
params = {
    0 : 'batch_size',
    2 : 'image_height',
    3 : 'image_width',
}
dynamic_axes_dict = {
    'input': params,
    'out_1': params,
    'out_2': params,
    'out_3': params,
    'block_cat': params,
}
onnx_program = torch.onnx.export(model,
                                dummy_input,
                                "teed.onnx",
                                dynamic_axes=dynamic_axes_dict,
                                output_names=['out_1', 'out_2', 'out_3', 'block_cat'],
                                input_names=['input'])
def sigmoid(z):
    return 1/(1 + np.exp(-z))

ort_session = ort.InferenceSession("teed.onnx")


image_path = 'test.jpg'
device = "cuda" if torch.cuda.is_available() else "cpu"
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
i_h, i_w, _ = image.shape
image = cv2.resize(image, None, fx=0.3, fy= 0.3, interpolation=cv2.INTER_LINEAR)
img_width = ((image.shape[1] // 16) + 1) * 16
img_height = ((image.shape[0] // 16) + 1) * 16
image = cv2.resize(image, (img_width, img_height))
image = np.array(image, dtype=np.float32)
image = image.transpose((2, 0, 1))
image = image[np.newaxis, :]


start_time = time.time()

outputs = ort_session.run(
    None,
    {"input": image},
)
outputs = np.array(outputs)

print("--- %s seconds ---" % (time.time() - start_time))

edge_maps = []
for i in outputs:
    tmp = sigmoid(i)
    edge_maps.append(tmp)

tensor = np.array(edge_maps)
output = tensor[:, 0, ...]
output = np.squeeze(output)
fuse = output[-1]
fuse = np.uint8(image_normalization(fuse))
fuse = cv2.bitwise_not(fuse)
fuse = cv2.resize(fuse, (i_w, i_h), interpolation=cv2.INTER_LINEAR)

cv2.imwrite('out.png', fuse)

thank you for share it!

Thanks @xavysp for this great model and thanks @ewfian for sharing your conversion script! I created a colab to do the conversion to ONNX together with some pre/post-processing added in, as well as float16 conversion. BTW @ewfian I am not sure why you used 16 for the padding and not 4.

The colab also includes demo inference code and benchmarks for onnxruntime for python/cuda, js/wasm and js/webgpu. In these synthetic benchmarks I get for 1920x1080 input and the float16 model: 25fps on t4 for python/cuda and 10fps on 3070ti laptop for WebGPU. I am still working on improving the latter as I am using this in my live youtube video effects demonstrator: https://github.com/eyaler/LordTubeMaster