Hardware Accelerated decoding
Opened this issue · 4 comments
I'd love to add this but it seems like unless we can figure out a 0-copy mechanism to get the texture into Godot using a native format this would not improve performance appreciably. Only QSV decoding seemed to have a small benefit when dumping raw frames to a null device (NUL, /dev/null) on slower cpus. Please contribute any findings to this thread if you can find a case where rawvideo is improved by hardware device decoding.
Additionally, should test with -pix_fmt rgb32
before -f rawvideo
as that would perform the rgb32 conversion required to show the frames in godot.
The last result in this table shows the expected result, 173fps vs 169fps and reduced the cpu load from 37% to 33%
Hardware | OS | Cmdline | hwaccel | FPS | Speed | Video Decode GPU (from taskmanager) | Video |
---|---|---|---|---|---|---|---|
i7-8750H GeForce GTX 1070 with Max-Q Design/PCIe/SSE2 | Windows | ffmpeg -y -i out9-fhd.webm -f rawvideo NUL | 743 | 29.7x | Video: vp9 (Profile 0), yuv420p(tv, progressive), 1920x1080, SAR 1:1 DAR 16:9, 25 fps 250kb/s 49.60s | ||
ffmpeg -y -hwaccel cuda -i out9-fhd.webm -f rawvideo NUL | cuda | 342 | 13.7x | nvidia | |||
dxva2 | 269 | 10.8x | Intel | ||||
qsv | 582 | 23.3x | |||||
d3d11va | 167 | 6x | |||||
i7-8650U UHD Graphics 620 (KBL GT2) | Linux | 578 | 23.1x | ||||
vdpau | 514 | 20.6x | |||||
vaapi | 45 | 1.8 | |||||
drm | Device creation failed: -14. | ||||||
opnecl | 588 | 23.5x | |||||
cuvid | need nvidia | ||||||
cuda | need nvidia | ||||||
Windows | 409 | 16.3x | |||||
cuda | need nvidia | ||||||
dxva2 | 189 | 7.5x | Intel | ||||
qsv | 406 | 16.3x | |||||
d3d11va | 99 | 3.94x | Intel | ||||
opencl | 414 | 16.6x | |||||
vulkan | 405 | 16.2x | |||||
8a41a2c3....webm | |||||||
397 | 13.2x | ||||||
dxva2 | 209 | 6.95x | |||||
qsv | 430 | 14.3x | |||||
d3d11va | 100 | 3.32x | |||||
opencl | 382 | 12.7x | |||||
vulkan | 411 | 13.7x | |||||
-pix_fmt rgb32 -f rawvideo | 169 | 5.64x | |||||
qsv | 173 | 5.76x | |||||
i7-8750H GeForce GTX 1070 with Max-Q Design/PCIe/SSE2 | Windows | 8a41a2c3....webm | vp9 (Profile 0), yuv420p(tv), 1920x1080, SAR 229:252 DAR 916:567, 30 fps 68 kb/s 14m 15.94s | ||||
729 | 24.2x | ||||||
d3d11va | 142 | 4.75x | |||||
qsv | 769 | 25.6x | |||||
dxva2 | 352 | 11.7x | |||||
i5-5200U Intel HD Graphics 5500 | Windows | ffmpeg -y -i 8a41a2c3....webm -f rawvideo NUL | 234 | 7.8x | |||
qsv | 237 | 7.91x | |||||
-pix_fmt rgb32 -f rawvideo | 87 | 2.9x | |||||
-pix_fmt rgb32 -f rawvideo | qsv | 87 | 2.9x | ||||
-pix_fmt yuv420p -f rawvideo | qsv | 245 | 8.18x | ||||
-pix_fmt yuv420p -f rawvideo | 248 | 8.26x | |||||
-pix_fmt yuv422p -f rawvideo | 111 | 3.71x |
windows build was https://github.com/BtbN/FFmpeg-Builds/releases 2020-11-23 (shared-vulkan when possible)
one possible option is to copy the texture using EGL or similar?
http://wiki.100ask.org/EGL_texture_0-copy
mpv player looks like a good example using ffmpeg
https://github.com/mpv-player/mpv/blob/80c4aaa2a4e7ada6530ad4f16172283cd82fcc1d/libmpv/render_gl.h#L133
Seems like mpv can render directly to hw contexts
https://github.com/mpv-player/mpv/blob/802f594a857c703ac88e946d14b69cd3b6eb6006/video/out/opengl/hwdec_dxva2egl.c#L320
https://github.com/mpv-player/mpv/blob/172146e9f7a231b5de21921d883612d18b13a717/video/decode/vd_lavc.c
Something something framebuffers
https://learnopengl.com/Advanced-OpenGL/Framebuffers
For linux, looks like we'd need both of these: https://wiki.debian.org/HardwareVideoAcceleration
(prefer VA-API except with NVIDIA proprietary drivers)
VA-API - Supported on Intel, AMD, and NVIDIA (only via the open-source Nouveau drivers). Widely supported by software, including Kodi, VLC, MPV, Chromium, and Firefox. Main limitation is lacking any support in the proprietary NVIDIA drivers.
VDPAU - Supported fully on AMD and NVIDIA (both proprietary and Nouveau). Supported by most desktop applications like Kodi, VLC, and MPV, but has no support at all in Chromium or Firefox. Main limitations are poor and incomplete Intel support and not working with browsers for web video acceleration.
Considerations with libva which includes binary blobs (shouldn't be an issue as long as we avoid GPL licensing)
intel/libva#118