V-Sekai/godot-whisper

.wav file issues

JBlank19 opened this issue · 5 comments

Good day!

The audio transcription node does not work with most of the .wav files. For example the .wav files produced from godot itself when recording the mic. However, it does work with the capture node. It seems some issue with the formatting of the input data.

fire commented

As far as I know we wrote the capture node because we weren't able to get the record node to work 3-4 years ago.

@fire My god, I just wasted like half a day trying to figure out why the recorded wav doesn't work...

fire commented

@AllenDang Here was the original design documentation. godotengine/godot-proposals#2013

The .wav will not work since the API receives the direct sound buffer data after/if you would decode the .wav:
Array transcribe(PackedFloat32Array buffer, String initial_prompt, int audio_ctx);

If you just read the .wav file, I think that would have some extra encoding data.

Will update readme to say that .wav doesn't work and that transcribe currently only works with a float32 buffer. For wav you have to decode it yourself.