elFarto/nvidia-vaapi-driver

Is it actually worth it?

Opened this issue · 10 comments

When I watch youtube/twitch/other videos in "default CPU-decoding" by webbrowsers, my GPU is:

GPU 210MHz MEM 405MHz TEMP 36°C

And CPU usage around 20% on 65Watt CPU for 480p-720p videos.

When I use this nvidia-vaapi-driver in Firefox - even watching single 480p video, when it on screen:

 GPU 2460MHz MEM 8250MHz TEMP  42°C FAN   0% POW N/A / 115 W
 GPU[|                          3%] MEM[|||           0.906Gi/7.996Gi] DEC[       1%]

I mean - boost from 210MHz to 2460MHz is not free in power consumption, and same for memory.
I can not see power-consumption it always N/A by Nvidia driver.

It worth it for 4k videos that for sure, but for 480p-720p... is it?

P.S. Just updated to 550.54.14 Nvidia driver - this nvidia-vaapi-driver gpu-acceleration works in Wayland, Wayland works must better now on Nvidia, there still lots of bugs especially with chrome, but Wayland on Nvidia works now 100x better than on 545 drivers.

P.S.S. Thanks for making something that actually works for GPU video acceleration in webbrowser.

See #74. nvdec requires cuda, and cuda forces the GPU into at least the P2 state, while the actually hardware decoding works fine at P5. Both VDPAU and Vulkan video decode demonstrate this - they use the same video decoding hardware but don't force the power state to increase. But VDPAU can't be used to implement a VAAPI driver, and I shudder to think about how complex a vulkan implementation would look compared to the current nvdec based one.

Thanks, I see.
In 550 Nvidia driver release they say:

Added support for the VK_KHR_video_encode_queue, VK_KHR_video_encode_h264, VK_KHR_video_encode_h265 and VK_KHR_video_maintenance1 extensions.

I also do not know how complex is it, and is it actually works.

Found this https://forums.developer.nvidia.com/t/remove-p2-forced-state-from-drivers/241998
And this https://babeltechreviews.com/nvidia-cuda-force-p2-state/
Seems there no way to turn it of for linux only, when on Windows it is possible.

TLDR:
I made this summary blog post - https://danilw.github.io/blog/nvidia_linux_gpu_video_accceleration_webbrowsers/

Hi all. I'm subbed to this issue because like OP, the forced P2 power state massively increases the power required to decode video, resulting in fan noise, where my CPU will do the job passively cooled. I've sadly been forced to uninstall this driver until nvidia fix this.

I read today, though, that OBS Studio will support nvenc natively in version 30.2, the beta being released today. I wonder if they might have some useful code to share, here?

I know that the interface to use nvenc on Windows did have this exact same issue, and after some time, and as a result of pressure from OBS users, the interface was modified by nvidia such that the P2 state was not enforced when the codec was in use. Perhaps we're fortunate enough that OBS' devs remembered this old problem and solved it on linux before it was rolled out :)

@pallaswept
nvenc - is nvidia video encoding
do not confuse with
nvdec - nvidia video decoding

OBS support nvenc in Linux on Nvidia for very long time 5+ years for sure.
And nvenc force p2 - is completely acceptable because video encoding is very compute heavy.
But nvdec - is video decoding - there no reason to use p2 for this low compute task.

OBS Studio will support nvenc natively in version 30.2

Maybe you mean - latest version of OBS will support "GPU-video encode without copy to RAM" - because this is only change I know that is added related to gpu video encoding in OBS.

Maybe you mean - latest version of OBS will support "GPU-video encode without copy to RAM" - because this is only change I know that is added related to gpu video encoding in OBS.

My bad, I guess the news report was a little misleading as to the nature of the addition :( I was all hopeful .... but anyway, maybe there are still useful toys over in OBS land?

Surprisingly, I'm not confused - forcing P2 is not necessary even for video encoding using nvenc, because the dedicated silicon makes the power draw so low. But it's not the compute demand that is the reason to force P2 - the P2 state is enforced because that is the only state that can guarantee reliable results from CUDA. It's not that it needs more power to do encoding, it's that it needs to not go too fast, or it might error. Forced P2 for CUDA is mostly about limiting memory clocks.

On Windows, it was once only good practice to disable P2 for streaming/gaming uses, where real CUDA compute (and its stability ensured by the P2 power state) was not needed. GeForce Experience always bypassed (but did not disable) forced P2 during recording and encoding. It completely ruins gaming performance to force P2, because it lowers you from the P0 you'd likely achieve, and reduces memory clocks, it negatively impacts performance (to the tune of about 5-10% average FPS, and HUGE increases in n% minimum frametimes... it's a lot) .

Because Windows gamers moving from Instant Replay to OBS Studio didn't know about the hidden 'disable force P2' ability in other tools, they suffered this performance loss. It's usually reported as 'recording/streaming makes me lose 10FPS in game' or similar, or, "I need a second PC to stream from because streaming hurts performance". So, under pressure from gamers and OBS, and the upcoming newer nvenc iterations being marketed as better than CPU encoding (pitched at game streamers) nvidia made nvenc via nvapi work just like Instant Replay always did, bypassing the P2 enforcement. Now, it is not only good practice, it is built into the API. Now, I know, that's Windows and this ain't... but I'm pretty sure it's not needed on linux either.

Anyway, my thinking here is that if OBS are doing that on windows, and they know it needs to be done here because forced P2 breaks their primary use-case (game streaming), then maybe they're doing it on linux too, and could be a useful resource? I'd say that it's entirely possible that it's just broken on linux and awaiting a fix from nvidia for their project, too, but.... maybe they have a secret trick. 🤞

Surprisingly, I'm not confused - forcing P2 is not necessary even for video encoding using nvenc, because the dedicated silicon makes the power draw so low.

You correct, my context was - "for low end GPU".

Because Windows gamers moving from Instant Replay to OBS Studio didn't know about the hidden 'disable force P2' ability in other tools, they suffered this performance loss.

This is very interesting, I did not know it.
Thanks for explaining!

No worries mate, I just happened to be so unfortunate as to be stuck on Windoze back then ;) Even there, we had to hack on the nvidia driver for a while.

I was digging around in the source of the app that I used to set this on windows, but didn't find anything useful for us on linux. It did occur to me that maybe, it might be possible to use Wine to call NVAPI DLLs, but, I very much doubt it would work without a windows driver.

I think we might have to just wait and pray that team green keeps up their recent pace in developing for linux, and great devs like our elFarto can back them up.

This is actually very good question and there is another thing to add - WHAT videos you going to watch in Firefox (for example)?
Its turned out that DRM-protected videos does not activate hardware acceleration nor in Firefox nor in Chrome (except Chrome OS), so if you going to watch some streaming service which uses Widewine all efforts to set it up is pretty much pointless.
If its Youtube or something like that its another situation.

You can disable P2 state by setting disable_vrr_memclk_switch to 1 in the nvidia_modeset module but the GPU will run constantly at P0 state even if CUDA programs are running. This could be useful for streamers but not for video decoding.

Another way to force P0 is by using the powermizer through nvidia-settings or kernel module parameters. It's nice because it works on X or Wayland.