shirayu/whispering

"No audio backend available" & VAD related error (Win10, Pytorch, worked on Sept27)

Hyenadae opened this issue · 6 comments

Hello Shirayu / Whispering contributors.

I've been testing out this fork for the past week without issue, but after upgrading to the latest release I am now getting errors and unable to use it because of a change in the audio backend? I am on Windows 10 21H2 and torch 1.12.1 (was on 1.13-dev before today). I usually update this software by downloading the github Zip and pip install ./ the directory, which has worked for the past few versions. I use VBAudioCable with '--mic 0' to have it listen to various videos running or audio sources for live transcription.

PC: RTX 3080Ti & Ryzen 5700X

In Powershell/Python: whispering --language en --model medium.en -n 80 --allow-padding --mic 0 --device cuda
is my full command line argument.

Some of the errors seen are:

torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.

torch\nn\modules\module.py:1130: UserWarning: operator () profile_node %1178 : int[] = prim::profile_ivalue(%1176)
does not have profile information (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\graph_fuser.cpp:108.)
return forward_call(*input, **kwargs)

After using --no-vad the errors ('profile information / operator error) went away and transcribing continued. This time though, I also have the 'transcriber.transcribe:269 WARNING -> Padding is not expected while speaking' warning which did not show up before, I assume because you are supposed to now use VAD instead of padding to keep the AI's 30 second window? happy

If someone could help me understand how the options now work (with all the development), for good/best 'live' transcription, I would appreciate it. I saw that adjusting -n from 160 (stock/ no -n?) to 80 has been decent speed with minimal errors, and that things were worse/cutting off at lower.

For the "No audio backend is available" on Windows you can use SoundFile according to this.
pip install PySoundFile

I'm facing the second error warning message as well.

Hello!

First, these are not "errors" but "warnings".

torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.

torch\nn\modules\module.py:1130: UserWarning: operator () profile_node %1178 : int[] = prim::profile_ivalue(%1176)
does not have profile information (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\graph_fuser.cpp:108.)
return forward_call(*input, **kwargs)

I suspect the torchaudio version is the problem.
Is the version of torchaudio is proper?
You can check it by using this command.

pip show torchaudio

Try installing the one that supports your version of CUDA version.
https://pytorch.org/get-started/locally/

Second, WARNING -> Padding is not expected while speaking will be shown if you use --allow-padding.
That warning is not related to VAD.
If you use that option, Whisper analyses each sound period even if they last less than 30 seconds by padding "zero".
https://github.com/shirayu/whispering/blob/master/whispering/transcriber.py#L275

Third, please read here about -n.
https://github.com/shirayu/whispering#parse-interval
If you still have questions, feel free to ask me again.

Note: Whisper assumes a 30-second interval as an input. So, whispering does not request analysis from whisper until 30 seconds have elapsed without --allow-padding.
However, it is useful to show temporary transcriptions for short intervals. (Partial analysis feature (#8))

Hi, thanks for the advice. My torchaudio is Version: 0.12.1+cu116, and installing PySoundFile removed the warnings. I'm still getting those torchaudio related warnings, but only at the start when I first run the program and does not impact further decoding. One behavior change I noticed is that the "Listening" countdown and estimate now force-scrolls the Powershell window down, instead of letting it freely scroll while the program transcribes. Very minor problem, but overall the setup is working as before, now for transcription text-file output soon? :)

Great.
To suppress the count down, please use --no-progress.
(It needs tests on Windows and Mac OS (#18))

--output option has already implemented. (#14)
Try it :-D

I added PySoundFile to the dependencies for Windows. (cfc8e28)
Thank you for your reports!