NVlabs/FasterViT

Error while exporting to dynamic onnx

tp-nan opened this issue · 8 comments

tp-nan commented

B = int(windows.shape[0] / (H * W / window_size / window_size))

Hi, It seems that batch-dim should not be type of int if dynamic onnx is considered:

dynamic_axes={"input": [0], "output": [0]} if batch_first else None,

tp-nan commented

Ah, it seems B is not batch dim

Thanks @ShiyangZhang . I am glad the issue is resolved.

tp-nan commented

Hi, I double checked it. When input is of shape [5,3,224,224], then B is equal to 5.
It seems that when exporting to onnx, instead of :

B = int(windows.shape[0] / (H * W / window_size / window_size))

it should be
B =windows.shape[0] / (H * W / window_size / window_size)

Another question,is it possible to maintains explicit batch?for example,layers after window_partition with shape [20, 49, 256] can be [5, 4, 49, 256]

Hi @ShiyangZhang , thanks for raising these concerns. I would like to ask if you could please provide more insights for each of the questions.

Also, just pushed a new version of onnx conversion script with opset_version=17 in case you would like to check.

tp-nan commented

Sure. Thanks for your time.
For the first question, convert the onnx with 224 resolution

python onnx_convert.py 

then change

x = torch.randn((1, 3, 1024, 1024))

to

x = torch.randn((5, 3, 224, 224))

and run

python onnx_test.py 

raised:

root@nisp-dmi-03:/workspace# python onnx_test.py 
2023-06-21 04:54:57.420280128 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running LayerNormalization node. Name:'/levels.2/downsample/norm/LayerNormalization' Status Message: Size of X.shape()[axis:] == 1280. Size of scale and bias (if provided) must match this. Got scale size of 256 and bias size of 256
Traceback (most recent call last):
  File "onnx_test.py", line 17, in <module>
    outputs = ort_sess.run(None, {'input': x.numpy()})
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running LayerNormalization node. Name:'/levels.2/downsample/norm/LayerNormalization' Status Message: Size of X.shape()[axis:] == 1280. Size of scale and bias (if provided) must match this. Got scale size of 256 and bias size of 256

At the mean time, if input is

x = torch.randn((1, 3, 224, 224))

it says:
Predicted shape: "(1, 1000)"".

Hi @ShiyangZhang , thanks for providing the logs. However, I am still not able to reproduce this issue. Per your suggestion I used x = torch.randn((5, 3, 224, 224)) and could successfully generate the onnx file. I am using onnx with 1.13.1 version.

Would you please confirm your onnx version and potentially try 1.13.1 as well if possible ?

Thanks again for the effort.

tp-nan commented

I mean generating the onnx file(torch.onnx.export) with x = torch.randn((1, 3, 224, 224)), but runing it(onnx_test.py, onnxruntime) with x = torch.randn((5, 3, 224, 224)).
As the onnx is exported with dynamic_axes, runing it with different batchsize should be possible
Thx again for your help

Hi @ShiyangZhang

Thank you so much for bringing this to our attention. #27 should address this and enable dynamic onnx batch sizes.