[V3 proposal] Improved defaults for quantization and device selection
Opened this issue · 4 comments
Feature request
Currently, Transformers.js V3 defaults to use CPU (WASM) instead of GPU (WebGPU) due to lack of support and instability across browsers (specifically Firefox and Safari, and Chrome in Ubuntu). However, this provides a poor user experience since is performance left on the table. As browser support for WebGPU increases (currently ~70%), this will become more important since users may experience poor performance when better settings are available.
A better proposal should be to use device: "auto"
instead of device: null
by default, which should select (1) quantization and (2) device) based on the following:
- Browser support (e.g., whether WebGPU is enabled)
- Device capabilities (OS, mobile vs. desktop, fp16 support)
- Model architecture/type (BERT models are more likely to succeed than encoder-decoder models) - some models have ops which are not supported in WebGPU.
Motivation
Improve user experience and performance with better defaults
Your contribution
Will work with @FL33TW00D on this
Current logic for session selection: https://github.com/xenova/transformers.js/blob/6505abb164a3eea1dd5e80e56a72f7d805715f0a/src/models.js#L148-L262
Some thoughts:
- Most users of
Transformers.JS
will want the optimal device to be selected for them, usingDevice.AUTO
. - Some advanced users will want to force an
ExecutionProvider
to be used. - We need to support both of the above use cases.
Currently, the distinction between our DEVICE_TYPES
and ORT is quite blurry.
I propose:
- Create a
class Device
which is aTransformers.JS
Device. - Simplify
DEVICE_TYPES
to just handleAuto
,CPU
,GPU
, andNPU
. - Use conversion functions to convert from
Device
->ORTBackend
etc.
This class will encapsulate all of the logic wrt to devices and converting a Device
into the required ExecutionProvider
when it is required. This device will also implement the above flow chart to ensure that users get the best experience.
The class Device
will primarily expose CPU
, GPU
, and NPU
, but users will also be able to directly provide an ORTExecutionProvider
to skip our whole flowchart and force their required EP.
We should create a class Device
and keep the enum DEVICE_TYPES
here. The enum should be changed to ensure no bleeding between us and ORT.
Congrats with v3 release! I have a couple of questions about WebNN
@xenova: I see that the respective device family is listed here
transformers.js/src/utils/devices.js
Line 14 in 6505abb
@FL33TW00D A follow-up to my above question: in your device selection diagram, there is no WebNN. As it's highly specialized and can leverage both NPUs and GPUs, could it be a first choice in the future? (after the API stabilizes and it becomes available for all mainstream browsers without the flags). What do you think?
I'm asking because I have some tech sessions scheduled where I plan to present about "client-side AI" in the browsers with WebNN in the focus. And I plan to use Transformers.js v3 for the demo (started to experiment with WebNN in it right after it landed in 3-alpha version). Thanks!