microsoft/onnxruntime

[web] `ort.InferenceSession.create` silently hangs/fails on iOS/iPad browsers if COEP/COOP headers are set

josephrocca opened this issue ยท 12 comments

Describe the bug
COEP/COOP headers must be set to cross-origin-isolate the page, which allows use of Wasm threads. If these headers are set, then the model doesn't load on iOS/iPad browsers. It simply "hangs" and never finishes initialization.

Urgency
The app is in production for thousands of users per day, but I'm not in a position to hurry you along ๐Ÿ˜…

System information

This bug only occurs on iOS and iPad browsers. I've tested the latest version of Chrome (v102) and Safari on iOS and iPad.

The bug does not occur on desktop browsers, including Safari and Mac. Every non-iOS/iPad browser that I've tested works fine.

To Reproduce

Note that I've had to use Replit instead of a service like JSBin because JSBin doesn't allow you to set headers.

Additional context

  • Note that, as you can see in the front-end code, I'm pre-downloading the file with fetch before passing it to ort.InferenceSession.create. This doesn't affect loading at all. I'm only doing this to ensure that the problem wasn't to do with model downloading.
  • For convenience, here's a direct link to the model being loaded in the above-linked minimal reproduction: https://huggingface.co/rocca/informative-drawings-line-art-onnx/resolve/main/model.onnx
  • The only way I was able to debug Chrome iOS was to visit chrome://inspect and start the logger. I didn't observe any error messages, but it could be that error messages aren't being correctly shown on that page, so perhaps it's failing with an error that I'm unable to see.

This should be due to the same cause to #11567

My understanding is that multi-threading Web Assembly is not supported on Apple devices / Or at least iOS ( I am not 100% sure about this, correct me if I am wrong ). So with COEP/COOP headers set ORT Web thinks it is OK to enable the multi-threading, which causes the failure.

@fs-eire Ah I see. It looks like latest MacOS Safari has thread support, and the issue you linked was using latest Safari version, and in my testing MacOS Safari works fine, so I'm not too sure what's going on there. It is specifically iOS browsers (i.e. iOS WebKit) that don't work for me.

I've just tried using these support checks from this web.dev article and iOS WebKit apparently has support for everything except tail call and simd:

const bigInt = () => (async e => { try { return (await WebAssembly.instantiate(e)).instance.exports.b(BigInt(0)) === BigInt(0); } catch (e) { return !1; } })(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 6, 1, 96, 1, 126, 1, 126, 3, 2, 1, 0, 7, 5, 1, 1, 98, 0, 0, 10, 6, 1, 4, 0, 32, 0, 11]));
const bulkMemory = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 5, 3, 1, 0, 1, 10, 14, 1, 12, 0, 65, 0, 65, 0, 65, 0, 252, 10, 0, 0, 11]));
const exceptions = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 10, 8, 1, 6, 0, 6, 64, 25, 11, 11]));
const multiValue = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 6, 1, 96, 0, 2, 127, 127, 3, 2, 1, 0, 10, 8, 1, 6, 0, 65, 0, 65, 0, 11]));
const mutableGlobals = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 2, 8, 1, 1, 97, 1, 98, 3, 127, 1, 6, 6, 1, 127, 1, 65, 0, 11, 7, 5, 1, 1, 97, 3, 1]));
const referenceTypes = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 10, 7, 1, 5, 0, 208, 112, 26, 11]));
const saturatedFloatToInt = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 10, 12, 1, 10, 0, 67, 0, 0, 0, 0, 252, 0, 26, 11]));
const signExtensions = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 10, 8, 1, 6, 0, 65, 0, 192, 26, 11]));
const simd = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 123, 3, 2, 1, 0, 10, 10, 1, 8, 0, 65, 0, 253, 15, 253, 98, 11]));
const tailCall = async () => WebAssembly.validate(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 10, 6, 1, 4, 0, 18, 0, 11]));
const threads = () => (async e => { try { return "undefined" != typeof MessageChannel && new MessageChannel().port1.postMessage(new SharedArrayBuffer(1)), WebAssembly.validate(e); } catch (e) { return !1; } })(new Uint8Array([0, 97, 115, 109, 1, 0, 0, 0, 1, 4, 1, 96, 0, 0, 3, 2, 1, 0, 5, 4, 1, 3, 1, 1, 10, 11, 1, 9, 0, 65, 0, 254, 16, 2, 0, 26, 11]));

So if those support checks are doing their job correctly, then it seems that this issue isn't to do with threads. That said, setting ort.env.wasm.numThreads = 1 does indeed fix the issue, so maybe the above threads support check is incomplete/wrong?

For now I've added a check for simd support and set numThreads=1 if there is no support. This works for now, but perhaps only by accident ๐Ÿค”

I made a change to optimize the feature detection. however this need to be validated before merge.

I debugged my iOS Safari with the demo website. The failure on iOS is due to a RangeError: Out of memory error.

This error should be happening inside WebAssembly.instantiate. Since the single-thread version is totally working fine (and it can work with 100MB+ model), this issue is wired.

I have no clue to further debug the issue. I think this is simply because a bug in iOS Safari (mac safari works good). The PR #11707 does not help to resolve this because both the old and new detection returns true for multi-thread support.

Is there a minimal version of that RangeError: Out of memory that could be submitted as a webkit bug report?

And in the meantime, is it possible to catch the RangeError: Out of memory and then if it's iOS, try falling back to single-threaded? Much better for it to work slowly than to not work at all on iOS.

I'm also experiencing this issue with multi-threaded + COOP headers with browsers on IOS devices.

I tried this change so in my case I'm not able to work-around this with ort.env.wasm.numThreads = 1

Update: I had to remove COOP headers to make it work again on IOS devices, my model is pretty large, though, 22MB.

d12 commented

Is there any workaround or fix yet besides disabling multithreading on iOS? I'm still seeing this issue a year later and I'd rather not disable multithreading on iOS, that'd be half of our userbase.

It has been some time and I am not sure if Apple managed to fix this problem for Safari on iOS. @d12 did you observe the issue still happening on iOS?

d12 commented

@fs-eire yeah, still happening on iOS 16.7.2 :/

iOS 17.6 the problem is still present

Seeing a crash during ort.InferenceSession.create on iOS 16.7.8 even with ort.env.wasm.numThreads = 1

Seems to crash on iPhone 8 but run slowly on iPhone X, both on iOS 16.7.8 so I'd be curious what else can be done to reduce the memory ceiling and prevent crashes.

When attempting to use 1.19.0-dev.20240727-1ce160883f I get this error which I can't find any leads to solve:
wasm streaming compile failed: CompileError: WebAssembly.instantiateStreaming(): section (code 1, "Type") extends past end of the module (length 36659183, remaining bytes 12804882) @+8

Seeing a crash during ort.InferenceSession.create on iOS 16.7.8 even with ort.env.wasm.numThreads = 1

Seems to crash on iPhone 8 but run slowly on iPhone X, both on iOS 16.7.8 so I'd be curious what else can be done to reduce the memory ceiling and prevent crashes.

When attempting to use 1.19.0-dev.20240727-1ce160883f I get this error which I can't find any leads to solve: wasm streaming compile failed: CompileError: WebAssembly.instantiateStreaming(): section (code 1, "Type") extends past end of the module (length 36659183, remaining bytes 12804882) @+8

This is wired. Did you verify whether it works on other environment? (windows/mac/android)