invalid sample: TooWide when running audio transcription example
SimunKaracic opened this issue · 2 comments
SimunKaracic commented
I'm trying to run the audio transcription example:
cargo run --example audio_transcription
using a file I recorded using sox sox -d audio.wav
and then converted with this command: sox audio.wav -r 16000 -c 1 audio16k1.wav
.
I get the same error, with either of the audio files:
thread 'main' panicked at 'invalid sample: TooWide', examples/audio_transcription.rs:51:24
Any ides on what I'm doing wrong?
Full output:
cargo run --example audio_transcription
Finished dev [unoptimized + debuginfo] target(s) in 0.02s
Running `target/debug/examples/audio_transcription`
whisper_init_from_file_with_params_no_state: loading model from '/Users/foo/Documents/whisper.cpp/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU buffer size = 147.46 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 14.86 MB
whisper_init_state: compute buffer (encode) = 85.99 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
thread 'main' panicked at 'invalid sample: TooWide', examples/audio_transcription.rs:51:24
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
tazz4843 commented
Seems like an issue with hound, as that doesn't touch whisper_rs code yet. It seems that you may have written a floating point audio file? It would be attempting to read 32 bits for a f32 sample vs the 16 required for i16 samples, resulting in a TooWide error.
tazz4843 commented
Gonna go ahead and close this, feel free to reopen if you still have issues.