Bug: Transscribing Media ends with exlamation marks

Question

Bug: Transscribing Media ends with exlamation marks

Opened this issue 2 months ago · 10 comments

What happened?

The transcript of a 1h multi speaker file generates the following output:
00:00 --> 01:20
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:20 --> 01:28
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:28 --> 01:39
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:40 --> 01:41
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:43 --> 01:44
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:44 --> 01:54
Speaker 1:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
01:54 --> 01:57

Steps to reproduce

step one, load a file larger than 1h into the app
step two, set speaker amount to 8, language german
start transcription
I use a Amd 7700XT, maybe thats the reason

What OS are you seeing the problem on?

Window

Relevant log output

App Version: vibe 2.6.3
Commit Hash: d24ffccb0d05ea822ff1a3a6edb3b9871be9f368
Arch: x86_64
Platform: windows
Kernel Version: 10.0.19045
OS: windows
OS Version: 10.0.19045
Cuda Version: n/a
Models: ggml-medium.bin
Default Model: "C:\\Users\\Me\\AppData\\Local\\github.com.thewh1teagle.vibe\\ggml-medium.bin"
Cargo features: vulkan


{
    "avx": {
        "enabled": true,
        "support": true
    },
    "avx2": {
        "enabled": true,
        "support": true
    },
    "f16c": {
        "enabled": true,
        "support": true
    },
    "fma": {
        "enabled": true,
        "support": true
    }
}

Answer 1 · 2024-11-13T01:55:09.000Z

Please show me example youtube video that it happens with or upload audio and show me what language to choose so I can reproduce it

Answer 2 · 2024-11-13T21:17:08.000Z

Hi, the language doesnt really matter, whether i chose "auto detect language", "german" or "english", its all excamation marks.

Regarding the audio and video: also doesnt matter in my case, different files / formats all resulted in the same problem.
I even changed from AMD Pro drivers to Gaming drivers, nothing changed.
I am sure you will be able to transcribe anything fine, just like I am on the CPU model ( except that its really slow)
Anything else I can provide to help?

Answer 3 · 2024-11-13T21:34:43.000Z

Maybe related to ggerganov/whisper.cpp#2400

Answer 4 · 2024-11-16T02:01:13.000Z

I have the same issue for transcribing audio clips longer than ~8 seconds. Vulkan build, 7900XTX, Windows 10.

Answer 5 · 2024-11-25T04:20:33.000Z

Could it be related to this issue?
ggerganov/llama.cpp#10434

Answer 6 · 2024-11-25T13:46:49.000Z

Could it be related to this issue?

Totally. do you experience the same issue? I can try update whisper.cpp in vibe and release beta version

Answer 7 · 2024-11-27T18:56:05.000Z

I released beta version with the new code
https://github.com/thewh1teagle/vibe/releases/download/v2.6.7/vibe_2.6.7_x64-setup.exe
Let me know if the problem fixed

Answer 8 · 2024-11-27T22:06:57.000Z

Hi,
thanks for the update.
First attempt transcribed "audio audio audio audio audio" then crashed
second one failed instantly with "Boundary error:
Error: Non-negative timestamp expected"

After that, I couldnt close the "Error A bug happened" Field, even when clicking "close".

Happens with various audio file inputs after a few seconds, even after reinstalling. GPU usage doesnt go above 3-4%

Answer 9 · 2024-11-28T11:45:11.000Z

With <=2.6.6 I experience the same problem with exclamation marks, with 2.6.7 version I get "pulp" or negative timestamp error depending in audio file. I have AMD Ryzen 9 7940HS with IPU but not GPU.

Answer 10 · 2024-12-15T17:56:37.000Z

I had the same problem. What worked for me was to uninstall amdvlk and lib32-amdvlk drivers,
and leave only vulkan-radeon and lib32-vulkan-radeon drivers.
On Archlinux.