RomanKlimov/faster-whisper-acceleration

Error with proceeding chunks

jbellic opened this issue · 2 comments

Hi,

initially I had big issues with using ffmpeg on a windows machine - fixed it by manually adjusting path in cmd parameter.
Now I get following error for a test mp3:

ffmpeg version N-110363-g2aad9765ef-20230424 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (crosstool-NG 1.25.0.152_89671bf)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --disable-libmfx --enable-libvpl --enable-openal --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20230424
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60. 10.100 / 60. 10.100
  libavformat    60.  5.100 / 60.  5.100
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  5.100 /  9.  5.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mp3, from 'rte.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:44.54, start: 0.025056, bitrate: 178 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13
  Stream #0:1: Video: png, rgb24(pc, gbr/unknown/unknown), 1280x720, 90k tbr, 90k tbn (attached pic)
    Metadata:
      comment         : Other
Output #0, mp3, to 'C:\Users\WORKST~1\AppData\Local\Temp\tmpg404_qs8.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    TSSE            : Lavf60.5.100
  Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1280x720, q=2-31, 90k tbr, 90k tbn (attached pic)
    Metadata:
      comment         : Other
  Stream #0:1: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[mp3 @ 0000017a4ade2040] No packets were sent for some of the attached pictures.
[out#0/mp3 @ 0000017a4add4c00] video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[out#0/mp3 @ 0000017a4add4c00] Output file is empty, nothing was encoded
frame=    0 fps=0.0 q=-1.0 Lsize=       1kB time=-577014:32:22.77 bitrate=N/A speed=N/A    
ffmpeg version N-110363-g2aad9765ef-20230424 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (crosstool-NG 1.25.0.152_89671bf)
  configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac --enable-ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --disable-libmfx --enable-libvpl --enable-openal --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20230424
  libavutil      58.  6.100 / 58.  6.100
  libavcodec     60. 10.100 / 60. 10.100
  libavformat    60.  5.100 / 60.  5.100
  libavdevice    60.  2.100 / 60.  2.100
  libavfilter     9.  5.100 /  9.  5.100
  libswscale      7.  2.100 /  7.  2.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mp3, from 'rte.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:44.54, start: 0.025056, bitrate: 178 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13
  Stream #0:1: Video: png, rgb24(pc, gbr/unknown/unknown), 1280x720, 90k tbr, 90k tbn (attached pic)
    Metadata:
      comment         : Other
Output #0, mp3, to 'C:\Users\WORKST~1\AppData\Local\Temp\tmpsx4rhuot.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    TSSE            : Lavf60.5.100
  Stream #0:0: Video: png, rgb24(pc, gbr/unknown/unknown), 1280x720, q=2-31, 90k tbr, 90k tbn (attached pic)
    Metadata:
      comment         : Other
  Stream #0:1: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[out#0/mp3 @ 000001a818d4b140] video:276kB audio:696kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.058001%
frame=    1 fps=0.0 q=-1.0 Lsize=     972kB time=00:00:44.48 bitrate= 179.0kbits/s speed=2.89e+03x    
Format mp3 detected only with low score of 1, misdetection possible!
Failed to read frame size: Could not seek to 1026.
Traceback (most recent call last):
  File "C:\Users\workstation\Desktop\faster-whisper-acceleration-main\parallelization.py", line 153, in <module>
    result = transcribe_audio(input_audio, max_processes, silence_threshold="-20dB", silence_duration=2, model=model)
  File "C:\Users\workstation\Desktop\faster-whisper-acceleration-main\parallelization.py", line 134, in transcribe_audio
    segments = future.result()
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
  File "C:\Program Files\Python39\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\workstation\Desktop\faster-whisper-acceleration-main\parallelization.py", line 112, in transcribe_file
    segments, info = model.transcribe(file_path)
  File "C:\Users\workstation\Desktop\faster-whisper-acceleration-main\venv\lib\site-packages\faster_whisper\transcribe.py", line 215, in transcribe
    audio = decode_audio(
  File "C:\Users\workstation\Desktop\faster-whisper-acceleration-main\venv\lib\site-packages\faster_whisper\audio.py", line 45, in decode_audio
    with av.open(input_file, metadata_errors="ignore") as container:
  File "av\container\core.pyx", line 401, in av.container.core.open
  File "av\container\core.pyx", line 272, in av.container.core.Container.__cinit__
  File "av\container\core.pyx", line 292, in av.container.core.Container.err_check
  File "av\error.pyx", line 336, in av.error.err_check
av.error.ValueError: [Errno 22] Invalid argument: 'C:\\Users\\WORKST~1\\AppData\\Local\\Temp\\tmpg404_qs8.mp3'; last error log: [mp3] Failed to read frame size: Could not seek to 1026.

Process finished with exit code 1

Hi @jbellic , is this still a current issue? Are you experiencing this error for every file or just some of them? I cannot reproduce the problem, but it seems to be smth with the path of temp file or maybe access, try to open the temp file manually C:\Users\WORKST~1\AppData\Local\Temp\tmpg404_qs8.mp3 to see if you can find it and try to run it again in admin mode

Yes, the issue is still persistent. I try to process a single mp3 (around 1 minutes). The initial issue with ffmpeg remains when no ffmpeg-python is used. Btw, I could see the file in the temp directory but it was 0 bytes. Is you code tested on a windows machine?