tazz4843/whisper-rs

invalid sample: TooWide when running audio transcription example

SimunKaracic opened this issue · 2 comments

I'm trying to run the audio transcription example:

cargo run --example audio_transcription

using a file I recorded using sox sox -d audio.wav and then converted with this command: sox audio.wav -r 16000 -c 1 audio16k1.wav.

I get the same error, with either of the audio files:

thread 'main' panicked at 'invalid sample: TooWide', examples/audio_transcription.rs:51:24

Any ides on what I'm doing wrong?

Full output:

cargo run --example audio_transcription
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/examples/audio_transcription`
whisper_init_from_file_with_params_no_state: loading model from '/Users/foo/Documents/whisper.cpp/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =   147.46 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   14.86 MB
whisper_init_state: compute buffer (encode) =   85.99 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB
thread 'main' panicked at 'invalid sample: TooWide', examples/audio_transcription.rs:51:24
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Seems like an issue with hound, as that doesn't touch whisper_rs code yet. It seems that you may have written a floating point audio file? It would be attempting to read 32 bits for a f32 sample vs the 16 required for i16 samples, resulting in a TooWide error.

Gonna go ahead and close this, feel free to reopen if you still have issues.