shirayu/whispering

Set proper value to ``-n``

shirayu opened this issue · 4 comments

Too small -n makes no response, while too large value consumes memory.
Set proper value to -n and wake warning for too small value.

I tried with several -n values. In all cases but one, nothing is output to console. In only one try with -n =1 0 I could had something transcribed. First result is in a wrong language; second time it was the right transcription. I was not able to replicate it, though, i.e. I normally do not get any transcription.

[2022-09-23 20:29:26,345] transcriber._deal_timestamp DEBUG -> Length of consecutive: 0
0.00->2.00	無限に
[2022-09-23 20:29:26,347] transcriber._deal_timestamp DEBUG -> Length of buffer: 0
[2022-09-23 20:29:26,347] transcriber.transcribe DEBUG -> Last rest_start=None
[2022-09-23 20:29:26,349] cli.transcribe_from_mic DEBUG -> Segment: 1
[2022-09-23 20:29:26,353] transcriber.transcribe DEBUG -> seek=0, timestamp=2.0, rest_start=None
[2022-09-23 20:29:32,840] transcriber.transcribe DEBUG -> Result: temperature=0.00, no_speech_prob=0.24, avg_logprob=-0.80
[2022-09-23 20:29:32,840] transcriber._deal_timestamp DEBUG -> Length of consecutive: 0
2.00->4.00	 It is okay.
[2022-09-23 20:29:32,840] transcriber._deal_timestamp DEBUG -> Length of buffer: 0
[2022-09-23 20:29:32,840] transcriber.transcribe DEBUG -> Last rest_start=None
[2022-09-23 20:29:32,843] cli.transcribe_from_mic DEBUG -> Segment: 2
[2022-09-23 20:29:32,846] transcriber.transcribe DEBUG -> seek=0, timestamp=4.0, rest_start=None

@fantinuoli Did you set proper value to --language?
if the language is English, you need to set --language en like this.

poetry run whisper_streaming --language en --model base -n 20

I added the instruction about that in README. (e9e286d)

I also found a bug about --lanauge!
I fixed it at 9cd80ab.

pad_or_trim return torch.Size([1, 80, 3000]).
While speaking, padding is not expected.

When -n 160, torch.Size([1, 80, 3000]).
So, 160 or larger is expected.