WGPU issues in Macbook M1
DavidGOrtega opened this issue · 5 comments
As a TVM user Im very excited of this project because of the use of burn and its access to WGPU native. Personally speaking is the way to go.
However my tests are very discouraging. WGPU seems to be performing worse than CPU
WGPU
cargo run --release --bin transcribe --features wgpu-backend medium audio16k.wav transcription.txt
Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0: Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.
infer took: 49665 ms
CPU
cargo run --release --bin transcribe medium audio16k.wav en transcription.txt
Running `target/release/transcribe medium audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
...
Chunk 0: Hello, I am the whisper machine learning model. If you see this as text, then I am working properly.
infer took: 19517 ms
Transcription finished.
the code was slightly modified:
fn main() {
cfg_if::cfg_if! {
if #[cfg(feature = "wgpu-backend")] {
type Backend = WgpuBackend<AutoGraphicsApi, f32, i32>;
let device = WgpuDevice::BestAvailable;
} else if #[cfg(feature = "torch-backend")] {
type Backend = TchBackend<f32>;
let device = TchDevice::Cpu;
}
}
...
let start_time = Instant::now();
let (text, tokens) = match waveform_to_text(&whisper, &bpe, lang, waveform, sample_rate) {
Ok((text, tokens)) => (text, tokens),
Err(e) => {
eprintln!("Error during transcription: {}", e);
process::exit(1);
}
};
let end_time = Instant::now();
let elapsed_time_ms = end_time.duration_since(start_time).as_millis();
println!("infer took: {} ms", elapsed_time_ms);
Same 3X for tiny CPU vs tiny WGPU
Might not be optimised for my machine? It's not working maybe?
Mps performance is also quite bad, similar to WGPU
type Backend = TchBackend<f32>;
let device = TchDevice::Mps;
We need to review/profile if the stft (featurizer) is slow. I believe it was implemented manually and might be slow compared to specialized libraries.
thanks for the reply @antimora Im going to profile and see
I haven't yet prioritized optimization. Caching should speed up the inference significantly. I don't think the burn-wgpu backend has been significantly optimized yet. You might want to check with its maintainers.