Issues with the deployment on web

Question

Issues with the deployment on web

Closed this issue 9 months ago · 3 comments

I'm now trying to deploy the ONNX models directly into the web. They work fine with WebAssembly (wasm) but with WebGL they cannot be executed because Int64 is not supported on WebGL and SuperPoint uses this data type. So, ¿is there any way of modifying the models to use Int32 instead of Int64?

On the other hand, I have also found out that the optimized models don't work on the browser (neither with wasm nor with WebGL). The error is totally non-descriptive so I have no idea what could be happening. Maybe onnxruntime-web doesn't have support for some features that are used in the optimized models, I don't know.

Thanks!

Answer 1 · 2023-10-26T13:08:16.000Z

Hi @adricostas

You can try to convert INT64 to INT32 using a script like https://github.com/aadhithya/onnx-typecast.

Regarding the optimized models, I believe you're right about the unsupported ops (I fused the attention nodes into a single Multihead-Attention op, which seems to only work on onnxruntime-gpu with the CUDA Execution Provider).

Answer 2 · 2023-10-27T06:27:09.000Z

Hi,

Thanks for the prompt answer. Yes, I have already tried that option but it didn't work. The new model is generated but when I try to load it with onnxruntime-web it returns an error

Error: Can't create a session. ERROR_CODE: 1, ERROR_MESSAGE: Type Error: Type parameter (T) of Optype (Mul) bound to different types (tensor(int64) and tensor(int32) in node (/Mul_1).
    at t.checkLastError (bundle.min.js:1:307223)

Regarding the optimized models, do you think it would be possible to have an intermediate alternative to optimize the models being them usable in web? I am interested on reducing the inference time but above all in reducing the size of the model (MB).

Answer 3 · 2023-10-29T09:05:54.000Z

That error indicates that the conversion is incomplete (i.e., the script missed several ops). Unfortunately, there's no easy way to force the export to only use INT32, to my knowledge. (pytorch/pytorch#47980) You would have to create a converter similar to ONNX's FP32->FP16 converter, but for INT64->INT32.

I'm not an expert at web deployment, but if the WebAssembly execution provider works fine as you've said, a straightforward way to reduce the model size is to use FP16 models. You could go one step further and quantize the models to INT8, but I haven't tried this route yet.