`ffmpeg-python` fails when used with joblib's `loky` backend.
charlienewey opened this issue · 0 comments
There is some unexepected behaviour resulting in ffmpeg
errors when using ffmpeg-python
in conjunction with joblib.Parallel
and the loky
backend. I came across the issue while working with video files, but others have had problems working with audio as well.
In my case the issue manifested as ffmpeg tasks failing with an error such as the below;
[h264 @ 0x55c4e0bd4a40] Invalid NAL unit size (14678337 > 57337).
Other processing backends (e.g. multiprocessing) work fine. I found some threads indicating that loky
may cause garbage data to be sent to STDIN
if an input stream is not explicitly configured (such as `subprocessing.DEVNULL).
Links relating to the problem:
- joblib/loky#262 (comment)
- https://stackoverflow.com/questions/52007992/ffmpeg-transcoding-on-lambda-results-in-unusable-static-audio/52008583#52008583
- https://stackoverflow.com/questions/52724802/ffmpeg-on-aws-lambda-invalid-nal-unit-size
I don't know what the nature of the fix should be. As far as I can see there are several options;
- Explicitly setting problematic streams (i.e.
input
) tosubprocess.DEVNULL
ifinput=None
inrun
andrun_async
. - Opening a bug report in
joblib
orloky
Either would work.
Reproducing issue
Minimal example:
import ffmpeg
from joblib import Parallel, delayed
def simple_function(filename):
ffmpeg.input(filename).output("null", f="null").run()
filelist = ["myfile.mp4"]
Parallel(n_jobs=2, return_as="list")(
delayed(simple_function)(filename) for filename in filelist
)
Output from example:
Output from ffmpeg process
(etl-tools-py3.11) ➜ ~ python3 mwe.py
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
WARNING: library configuration mismatch
avcodec configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-liblensfun --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x561328d096c0] Found duplicated MOOV Atom. Skipped it
Last message repeated 2 times
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '000502B16BF6AB9A207744D5AD87B7416804DF13FBFE5AA568F1C5FBDCA3DBEB74D63D6D9FF36671B49AC90A7E544E9DA2A099DE491DBDBF0A8C682084C3883B':
Metadata:
encoder : Lavf57.83.100
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
Duration: 00:00:00.60, start: 0.000000, bitrate: 11947 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x1024, 11132 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
[h264 @ 0x561328d48340] Invalid NAL unit size (14678337 > 57337).
[h264 @ 0x561328d48340] Error splitting the input into NAL units.
[h264 @ 0x561328d90b00] Invalid NAL unit size (18499905 > 72265).
[h264 @ 0x561328d90b00] Error splitting the input into NAL units.
[h264 @ 0x561328dad6c0] Invalid NAL unit size (14092353 > 55048).
[h264 @ 0x561328dad6c0] Error splitting the input into NAL units.
[h264 @ 0x561328dc9e40] Invalid NAL unit size (17872193 > 69813).
[h264 @ 0x561328dc9e40] Error splitting the input into NAL units.
[h264 @ 0x561328de6700] Invalid NAL unit size (14536257 > 56782).
[h264 @ 0x561328de6700] Error splitting the input into NAL units.
[h264 @ 0x561328e02f80] Invalid NAL unit size (18103105 > 70715).
[h264 @ 0x561328e02f80] Error splitting the input into NAL units.
[h264 @ 0x561328e1f840] Invalid NAL unit size (11375169 > 44434).
[h264 @ 0x561328e1f840] Error splitting the input into NAL units.
[h264 @ 0x561328e3c100] Invalid NAL unit size (27159105 > 106090).
[h264 @ 0x561328e3c100] Error splitting the input into NAL units.
[h264 @ 0x561328e589c0] Invalid NAL unit size (22849 > 89).
[h264 @ 0x561328e589c0] Error splitting the input into NAL units.
[h264 @ 0x561328e751c0] Invalid NAL unit size (11361601 > 44381).
[h264 @ 0x561328e751c0] Error splitting the input into NAL units.
[h264 @ 0x561328e91b40] Invalid NAL unit size (10649153 > 41598).
[h264 @ 0x561328e91b40] Error splitting the input into NAL units.
[h264 @ 0x561328eae400] Invalid NAL unit size (11123777 > 43452).
[h264 @ 0x561328eae400] Error splitting the input into NAL units.
[h264 @ 0x561328ecacc0] Invalid NAL unit size (13141057 > 51332).
[h264 @ 0x561328ecacc0] Error splitting the input into NAL units.
[h264 @ 0x561328ee7580] Invalid NAL unit size (18391361 > 71841).
[h264 @ 0x561328ee7580] Error splitting the input into NAL units.
Error while decoding stream #0:0: Invalid data found when processing input
Last message repeated 13 times
Output #0, null, to 'null':
Metadata:
compatible_brands: isomiso2avc1mp41
major_brand : isom
minor_version : 512
encoder : Lavf58.29.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p, 1280x1024, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
Metadata:
handler_name : VideoHandler
encoder : Lavc58.54.100 wrapped_avframe
frame= 1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.04 bitrate=N/A speed= 5x
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Conversion failed!
Output from Python
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 661, in wait_result_broken_or_wakeup
result_item = result_reader.recv()
^^^^^^^^^^^^^^^^^^^^
File "/home/charlie/.pyenv/versions/3.11.2/lib/python3.11/multiprocessing/connection.py", line 250, in recv
return _ForkingPickler.loads(buf.getbuffer())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Error.__init__() missing 2 required positional arguments: 'stdout' and 'stderr'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/charlie/mwe.py", line 11, in <module>
Parallel(n_jobs=2, backend="loky")(
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1944, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1587, in _get_outputs
yield from self._retrieve()
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1691, in _retrieve
self._raise_error_fast()
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 1726, in _raise_error_fast
error_job.get_result(self.timeout)
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 735, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/charlie/.cache/pypoetry/virtualenvs/etl-tools-31KlaDBn-py3.11/lib/python3.11/site-packages/joblib/parallel.py", line 753, in _return_or_raise
raise self._result
joblib.externals.loky.process_executor.BrokenProcessPool: A result has failed to un-serialize. Please ensure that the objects returned by the function are always picklable.
Minimum "fixed" example
It can be shown that executing subprocess
with an explicit stdin_stream=subprocess.DEVNULL
stops the issue from occurring.
import ffmpeg
import subprocess
from joblib import Parallel, delayed
def simple_function(filename):
strm = ffmpeg.input(filename).output("null", f="null")
args = strm.compile()
proc = subprocess.Popen(
args,
stdin=subprocess.DEVNULL,
cwd=None,
)
out, err = proc.communicate(subprocess.DEVNULL)
retcode = proc.poll()
filelist = ["myfile.mp4"]
Parallel(n_jobs=2, return_as="list")(
delayed(simple_function)(filename) for filename in filelist
)