kkroening/ffmpeg-python

`ffmpeg-python` fails when used with joblib's `loky` backend.

charlienewey opened this issue · 0 comments

There is some unexepected behaviour resulting in ffmpeg errors when using ffmpeg-python in conjunction with joblib.Parallel and the loky backend. I came across the issue while working with video files, but others have had problems working with audio as well.

In my case the issue manifested as ffmpeg tasks failing with an error such as the below;

[h264 @ 0x55c4e0bd4a40] Invalid NAL unit size (14678337 > 57337).

Other processing backends (e.g. multiprocessing) work fine. I found some threads indicating that loky may cause garbage data to be sent to STDIN if an input stream is not explicitly configured (such as `subprocessing.DEVNULL).

Links relating to the problem:

I don't know what the nature of the fix should be. As far as I can see there are several options;

  • Explicitly setting problematic streams (i.e. input) to subprocess.DEVNULL if input=None in run and run_async.
  • Opening a bug report in joblib or loky

Either would work.

Reproducing issue

Minimal example:

import ffmpeg
from joblib import Parallel, delayed

def simple_function(filename):
    ffmpeg.input(filename).output("null", f="null").run()

filelist = ["myfile.mp4"]

Parallel(n_jobs=2, return_as="list")(
    delayed(simple_function)(filename) for filename in filelist
)

Output from example:

Output from ffmpeg process

(etl-tools-py3.11) ➜  ~ python3 mwe.py
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  WARNING: library configuration mismatch
  avcodec     configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-liblensfun --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x561328d096c0] Found duplicated MOOV Atom. Skipped it
    Last message repeated 2 times
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '000502B16BF6AB9A207744D5AD87B7416804DF13FBFE5AA568F1C5FBDCA3DBEB74D63D6D9FF36671B49AC90A7E544E9DA2A099DE491DBDBF0A8C682084C3883B':
  Metadata:
    encoder         : Lavf57.83.100
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
  Duration: 00:00:00.60, start: 0.000000, bitrate: 11947 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x1024, 11132 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
[h264 @ 0x561328d48340] Invalid NAL unit size (14678337 > 57337).
[h264 @ 0x561328d48340] Error splitting the input into NAL units.
[h264 @ 0x561328d90b00] Invalid NAL unit size (18499905 > 72265).
[h264 @ 0x561328d90b00] Error splitting the input into NAL units.
[h264 @ 0x561328dad6c0] Invalid NAL unit size (14092353 > 55048).
[h264 @ 0x561328dad6c0] Error splitting the input into NAL units.
[h264 @ 0x561328dc9e40] Invalid NAL unit size (17872193 > 69813).
[h264 @ 0x561328dc9e40] Error splitting the input into NAL units.
[h264 @ 0x561328de6700] Invalid NAL unit size (14536257 > 56782).
[h264 @ 0x561328de6700] Error splitting the input into NAL units.
[h264 @ 0x561328e02f80] Invalid NAL unit size (18103105 > 70715).
[h264 @ 0x561328e02f80] Error splitting the input into NAL units.
[h264 @ 0x561328e1f840] Invalid NAL unit size (11375169 > 44434).
[h264 @ 0x561328e1f840] Error splitting the input into NAL units.
[h264 @ 0x561328e3c100] Invalid NAL unit size (27159105 > 106090).
[h264 @ 0x561328e3c100] Error splitting the input into NAL units.
[h264 @ 0x561328e589c0] Invalid NAL unit size (22849 > 89).
[h264 @ 0x561328e589c0] Error splitting the input into NAL units.
[h264 @ 0x561328e751c0] Invalid NAL unit size (11361601 > 44381).
[h264 @ 0x561328e751c0] Error splitting the input into NAL units.
[h264 @ 0x561328e91b40] Invalid NAL unit size (10649153 > 41598).
[h264 @ 0x561328e91b40] Error splitting the input into NAL units.
[h264 @ 0x561328eae400] Invalid NAL unit size (11123777 > 43452).
[h264 @ 0x561328eae400] Error splitting the input into NAL units.
[h264 @ 0x561328ecacc0] Invalid NAL unit size (13141057 > 51332).
[h264 @ 0x561328ecacc0] Error splitting the input into NAL units.
[h264 @ 0x561328ee7580] Invalid NAL unit size (18391361 > 71841).
[h264 @ 0x561328ee7580] Error splitting the input into NAL units.
Error while decoding stream #0:0: Invalid data found when processing input
    Last message repeated 13 times
Output #0, null, to 'null':
  Metadata:
    compatible_brands: isomiso2avc1mp41
    major_brand     : isom
    minor_version   : 512
    encoder         : Lavf58.29.100
    Stream #0:0(und): Video: wrapped_avframe, yuv420p, 1280x1024, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc58.54.100 wrapped_avframe
frame=    1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.04 bitrate=N/A speed=   5x    
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Conversion failed!

Output from Python

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 661, in wait_result_broken_or_wakeup
    result_item = result_reader.recv()
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/charlie/.pyenv/versions/3.11.2/lib/python3.11/multiprocessing/connection.py", line 250, in recv
    return _ForkingPickler.loads(buf.getbuffer())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Error.__init__() missing 2 required positional arguments: 'stdout' and 'stderr'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/charlie/mwe.py", line 11, in <module>
    Parallel(n_jobs=2, backend="loky")(
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1944, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1587, in _get_outputs
    yield from self._retrieve()
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1691, in _retrieve
    self._raise_error_fast()
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1726, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 735, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 753, in _return_or_raise
    raise self._result
joblib.externals.loky.process_executor.BrokenProcessPool: A result has failed to un-serialize. Please ensure that the objects returned by the function are always picklable.

Minimum "fixed" example

It can be shown that executing subprocess with an explicit stdin_stream=subprocess.DEVNULL stops the issue from occurring.

import ffmpeg
import subprocess
from joblib import Parallel, delayed

def simple_function(filename):
    strm = ffmpeg.input(filename).output("null", f="null")

    args = strm.compile()
    proc = subprocess.Popen(
        args,
        stdin=subprocess.DEVNULL,
        cwd=None,
    )
    out, err = proc.communicate(subprocess.DEVNULL)
    retcode = proc.poll()

filelist = ["myfile.mp4"]

Parallel(n_jobs=2, return_as="list")(
    delayed(simple_function)(filename) for filename in filelist
)