Support wavlm-base-plus-sv with WebGPU

Question

Support wavlm-base-plus-sv with WebGPU

Opened this issue 3 months ago · 3 comments

flatsiedatsie commented 3 months ago

System Info

Transformers.js Alpha 10, Brave

Environment/Platform

Description

Not sure what happened, but:

did page refresh
started Whisper
Saw this:

I did just fiddle with moving a wasm file into a local folder. But since it doesn't seem to load those, it shouldn't be the cause.

Reproduction

I'll share more if I can reproduce it myself.

Answer 1 · 2024-08-31T08:57:47.000Z

It seems to be related to the audio verification model I'm using.

Perhaps I'm accidentally running it in paralel.

Answer 2 · 2024-09-02T08:24:59.000Z

I had forgotten an await. But even after fixing that it still occurs.

I've added a 5 second delay between verification of the audio snippets, and in the console I can see that it's only attempting a verification every 5+ seconds. So it's definitely not running in paralel.

It also seems to output an embedding, despite the error. Though perhaps it's outputting an older embedding? I'm going to check that next.

Answer 3 · 2024-09-02T09:01:14.000Z

I played around with D-types. Then I realized that the issue is probably with using WebGPU in the first place.

Solution

I switched the verification model over to WASM, and bingo, now it runs fine. It's detecting multiple speakers again.

Conclusion

So the conclusion is: the Xenova/wavlm-base-plus-sv model does not yet have WebGPU support.

It doesn't seem to be much slower, so I don't think it matters at all. But just for completeness I'll rename this issue to 'Support wavlm-base-plus-sv with WebGPU'.

wespeaker-voxceleb-resnet34-LM

I also quickly swapped in onnx-community/wespeaker-voxceleb-resnet34-LM. When using WebGPU (at default or forced to FP32) it doesn't output errors, but the embeddings it returns aren't useful? Here are some similarity score outputs (with the similarity threshold lowered to 0.5 instead of 0.95):

(expectation: two speakers)

This could just be implementation error on my part. But for now I'll be sticking with wavlm-base-plus-sv.