terminatorul/NvStrapsReBar

right to rebar: pcie 2.0 & 3.0 gpus can do rebar

Opened this issue · 17 comments

System

  • Motherboard: gigabyte x570 ud rev. 1.0
  • BIOS Version: f37 12/26/2022
  • GPU: zotac amp rtx 2080 ti tu102-300a rev. a1

Description

ive kept a close eye on rebar since it appeared and always waited for a mod like this.

now my questions are:

is rebar a pcie 3.0 specification?

can every pcie 3.0 gpu do full rebar?

why does capframex report full rebar in direct3d and does it mean i dont need an uefi / (v)bios mod to get the benefits of full rebar?

rebar

i have a zotac amp rtx 2080 ti here with a ryzen 5700x on a gigabyte x570 ud and 32gb 3600 dual rank ram. but since i dont have a vrr monitor yet (dont know if i will ever get one and rather get a 5800x3d for that price) and i run everything at 60hz/60fps @ 1080p ultra details im asking myself if i would benefit from this at all? would it improve latency, make things noticeable smoother or will it only help to achieve higher peak fps?

since ive optimized the bios and windows to the max everything runs butter smooth at max details so im very unsure if if should take the risk when there could be virtually no improvement also due to full rebar in d3d.

I could measure about 3% performance benefit in FPS, and 10-15% in PCI bandwidth test.

Other users measured 10% FPS increase in low-FPS max-quality (high texture) settings in Cyberpunk 2077 (but FPS was too low at that quality for the game to be playable, so the 10% increase is still not a real benefit).

Looks like PCI-SIG specifications are members-only and need to be purchased, but I can still see ReBAR introduced in apr 2008 for PCI v2.x. I might be wrong about it, though.

But the ReBAR extended capability is optional, and a lot of PCI devices that are not GPUs have no use for it. Definitely can say not every PCIe 3.0 GPU can do full rebar. For example my 1050 Ti uses PCI 3.0 but does not have the extended capability for ReBAR.

If you do not have ReBAR enabled in firmware on the PCI bus, I do not see how Resizable Bar D3D can show ON, or what that can even mean.

So if you are interested in performance, you should still enable it in the firmware, no matter what the capframex report shows

I have read somewhere old systems can not benefit from ReBAR, if PCIe bus (or the PCIe device) is not fast enough.

But you have a recent enough system (X570 / Ryzen 5700X) that can still see benefit from ReBAR, although for NVIDIA it is likely around 3%. It depends on the game, some users can see higher margins

thanks for your answers. that was exactly what i was looking for.

yes it seems to improve everything the gpu does. it would also slightly increase performance even in my case with vsync 60 and i believe it would be even noticeable in terms of better input latency.

i would really like to go for it but im afraid it could fail and lose my windows installation or soft brick my board.

do you believe that sometime in the future there could be an easier / without modding or flashing a new bios or more automated way to activate it? would it be possible for the mainboard producers to include a mod like yours into their bios to enable rebar support officially or can it only be done locally?

maybe something similar to what is mentioned here:

https://github.com/xCuri0/ReBarUEFI/wiki/Common-issues-(and-fixes)#how-do-i-enable-resizable-bar-on-unsupported-amd-gpus- (PCI-E Resizable Bar Service Mode and / regedit tweak)

its really strange what capframex shows there. it did the same for my gtx 1080 which i had before upgrading to the 2080 ti.

rebar2

ive enabled above 4g and rebar in bios but nvcpl, gpuz and report that its turned off.

so i believe what capframex shows is the capability of direct3d to access full vram by default with some gpus. we would probably have to ask the guys from capframex what it is thats showing there. as far as i can tell full rebar in d3d is always possible even with a4g and rebar turned off in bios. maybe this is also the reason that the results can be very similar between on and off.

Some discussion touched on the idea of using GRUB bootloader instead of an EFI mod: xCuri0#118, but it is just too difficult technically, and I can not be sure in advance if it is even possible. So I see the need for such functionallity, but I can not say if it will ever be implemented.

Other people showed concern about safety too xCuri0#89 (reply in thread), so I will give you the same answer: nobody bricked their board, and despite NvStrapsRebar is a new project, the base project ReBarUEFI it is forked from has a long history of just working.

First, there is no risk of breaking your Windows installation.

Next, the one rule you need to stick to, if you just want to be absolutely sure, is to disable ReBAR (using NvStrapsReBar.exe) before making changes in UEFI Setup and before making hardware changes. But even if you forget, you can recover by clearing CMOS and by reverting the hardware changes, or replacing the GPU with a different model.

Or if you want to wait, I am still hoping to add checks to the DXE driver in a future version, for extra safety, to automatically disable ReBAR if settings have changed in UEFI Setup, or if hardware has changed.

I still think it is not possible for Direct3D to access the full VRAM without PCI ReBAR enabled, it must do it in chunks of 256 MiB at a time. Do you have more information about how this would be always possible in Direct3D ?

Wait ... your board has Q-Flash Plus function. It means you can recover the board even if somehow you brick your board firmware. You're all set.

ive asked in the tpu forums about the issue with capframex but it seems like noone knows about it. i wanted to investigate it definitely though. think i will contact the guys from capframex and ask them about it.

it also seems to not be wholly identical with full hw rebar. maybe d3d can only access full vram but it comes with other limitations. but im not sure what this is really about.

ill read the whole process through and it seems pretty safe all over. ive also flashed custom x79 roms and other things before.

yes that board and that whole pc is pretty delicious. i also have other pcs and gpus here in case of an emergency. ive been tempted and waiting for so long! ;)

can i just run the exe once and check on the current settings without making any changes?

main reason i wanted to improve performance with rebar is that i hoped it could maybe improve fps stability in vsync. there cane be these dips with vsync on which shouldnt be there due to the frame buffer being present. triple buffering could also improve this but its only available in opengl and probably vulkan. as far as i know directx / d3d has never supported triple buffering.

so what i found was that doom eternal has an extra triple buffered vsync mode and the option which is called "present from compute" (hint says: present the final imagine from an asynchronous que.) which adds a frame from an asynchronous que and that just fixes all issues with vsync and frame drops and makes the game utterly smooth. but it seems like the game needs to specifically support it and a lot just dont.

doom

doom232

as you can see im running everything at super low temps and power consumption (~40-50° open case). when i play games cpu+gpu power draw is like 100-250w max all the time while cpu regulary boost to max speed 4850mhz. ive tried to optimize for the best power-performance efficiency.

i also dont get any noticeable improvement from nvidia reflex on or boost which tells me im already around the lowest input latency possible. keyboard and mouse buffer did a lot to input latency and also installation with minimal install and advanced tweaks from nvcleanstall (msix irq mode) (msiutilv3):

https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts-msi-tool.378044/

https://www.techpowerup.com/download/microsoft-interrupt-affinity-tool/

Decrease mouse and keyboard buffer sizes (theres other very nice tweaks in there too):
https://github.com/DaddyMadu/Windows10GamingFocus/blob/ccddb5ce39a59127d6ded44b5a74771388bdc041/win10debloatandgamingtweaks.ps1#L2887

Yes, sure

very cool.

3333rebar

current config:
gpuz

another strange thing is this:

https://www.reddit.com/r/pcmasterrace/comments/yh6j1a/rtx_2080_ti_resizable_bar_supported_as_per_nvidia/

it shows an official nvidia chart where rebar is officially supported / enabled for rtx 2080 ti and 2080 super.

im really looking forward to future development on all of this. maybe it will be possible to make a bootable usb stick and get the driver loaded on the fly.

Yes, sure

so ive tried and contacted capframex twice but i didnt get any reply what the partial rebar is and what is displayed here in the ui. pretty strange.

@terminatorul

Next, the one rule you need to stick to, if you just want to be absolutely sure, is to disable ReBAR (using NvStrapsReBar.exe) before making changes in UEFI Setup and before making hardware changes. But even if you forget, you can recover by clearing CMOS and by reverting the hardware changes, or replacing the GPU with a different model.

Or if you want to wait, I am still hoping to add checks to the DXE driver in a future version, for extra safety, to automatically disable ReBAR if settings have changed in UEFI Setup, or if hardware has changed.

First part is now implemented, and v0.4 will automatically disable NvStrapsReBar if you make changes in UEFI Setup.

This is a safety measure because changes in UEFI settings bring changes to an internal address saved by NvStrapsReBar when ReBAR is enabled (the GPU BAR0 base address). Simply enabling NvStrapsReBar again will also update the saved address, and thus prevent using the old value.

Also helpfull to prevent users from accidentally enable CSM or disable "Above 4G Decoding" option in UEFI Setup, when NvStrapsReBar is already enabled and depends on those options.

i researched a little in the pcie 2.0 and 3.0 specs and boy the way theyre selling us rebar is just such a scam!

the points i am making here is that:

a) users have a "right to rebar" because its specified in pcie 2.0 and pcie 3.0 and they paid the full price expecting full functionality and full compatibility without any lack, limit or defunct. by pcie specs this also includes the ability to have control over and fully configure all important and relevant aspects of the respective devices functions especially if theres an impact on performance or compatibility.

if a pcie device though has the function but is missing the ability to configure its bar via bios / system software as is specified in pcie specs this means its not full compatible or not fully functional within pcie specs as was advertised at the moment of purchase.

if a user bought a pcie 2.0 or pcie 3.0 device that advertised full pcie compatibility and functionality yet the user has no way of configuring bar options that can impact performance negatively this represents a legal flaw and the device would have had to be sold with a rebate on price due to its flaw or with a note that at least had informed the user of the lack of the ability to configure / control bar options by system software as is specified in pcie specs which in turn again leads to decreased performance. yet the manufacturers have kept and obfuscated this information from the consumer with malintent well knowing there can be a negative impact on performance.

b) pcie products like gpus, mainboards etc. are in violation of pcie specifications whenever user ability to control rebar via system software (bios, os, driver, etc.) as is specified in pcie specs implementation note is obfuscated by manufacturers with the intent to resell a function to the user he has already paid for a second time thus scamming the consumer: 32 bit bar (128 byte up to 4gb) / (re)sizing / defining the bar size is always possible and available and active by system software (bios, driver, os, etc.) but manufacturers obfuscate the respective system software controls / bios option (resizable bar capability and control registers) on purpose with the intention to fool and mislead the user into believing he has to pay another extra fee to unlock an already present function and hidden option in the bios he has already paid full price for expecting full functionality, full compatibility and full performance. this is a clear indication of manipulation with malintent.

c) manufacturers have further deliberately chosen a strong sub optimal bar size contrary to their better knowing of performance requirements for gaming and its developments and necessities for users and then further on top even prohibit user ability to control rebar options (bar size) by system software well knowing this will lead to at least a minor or even strong negative performance impact on the users system of anywhere between 1-25% or even more.

first of all (re)bar was introduced in pcie 2.0 specs in the year 2006 and even updated in 2008 with 2.1. so this is super old and why has it been neglected all the time if its well known that theres a great impact on performance? why was there never a bios option in all those years, options in driver or any other conceivable way to make the bar size changeable for the user? pcie 3.0 is from 2010 and were like using 14 year old crap without even getting the full of it what we payed for:

pcie2

second, bar devices can be initiated by legacy boot and so there shouldnt be any need for an extra uefi boot driver or anything else because everything can be handled legacy or by the system software (bios, os, driver, etc.):

pcie5

the pcie spec implementation note defines that:

  • system software is able to automatically negotiate, present and set an optimal bar size depending on application requirements and
  • bar size can be chosen or defined freely anywhere from 128 byte to 4 gigabyte for 32 bit starting with default pcie 2.0 bar function, then pcie 2.1 already includes rebar function which means dynamic resizing on the fly and also above 4g for 64 bit applications (any bar size greater than 128 byte can be considered re-sized bar (not sure what is the default value though) and thus outing the respective device as rebar capable up to a certain ammount of memory until there will be negative effects from conflicts in addressing of resources and likely drops in performance or crahes etc.
  • a function of the system software is able to provide the user with a on-demand and on-the-fly choice of different sizes as well as automatically determining the largest possible bar size for optimal performance (it was well understood that a sub optimal size can lead to a negative performance impact)

pcie3

pcie4

further it is intended and specified that resizable bar capable pcie devices can be identified and controlled by system software:

The Resizable BAR Capability structure defines a PCI Express Extended Capability which is located in PCI Express Extended Configuration Space, that is, above the first 256 bytes, and is shown below in Figure 7-108. This structure allows devices with this capability to be identified and controlled. A Capability and a Control register is implemented for each BAR that is resizable.

in violation of this specification the manufacturers purposefully impede the users ability to identify and control the resizable bar capable devices by obfuscating the respective capability registers inside the system software (bios, os, driver, etc.) with the malintent to trick the user into paying an extra fee or buying an extra product to enable / unlock a function or option that is already present in the system and has been fully paid for already.

official pcie specs provide control for a on-the-fly / on-demand ability to chose a suitable bar size by system software that yields optimal performance for the respective application. even if system software does not provide the user with a choice to bar size an optimal bar size should always dynamically and automatically be negotiated per application. limiting bar size to a static value though that can never be negotiated by system software or customized / controlled by the user manually represents a violation to pcie specs and recommendations.

System software uses this capability in place of the above mentioned method of determining the resource size, and prior to assigning the base address to the BAR. Potential usable resource sizes are reported by the Function, and read, from the Resizable BAR Capability registers.

pcie7

pcie8

pcie9

pcie10

whenever (re)bar is reported by capframex as partial at 256mb hw or 214mb vulkan this means the device is "basically" (re)bar-capable: the bar is active and has been resized by the system software which means rebar is active and enabled but the user is getting prohibited from changing the bar size! the actual sizes are predefined by bios and in this case vulkan api, they are different to each other and appear arbitrarily and sub optimally chosen without any relation, sense or intent of well being to real word scenarios. sizes might as well be set to higher or customizeable values by the system softwares (bios and api) or application.

this also means that any api / software should be able to activate and control full or partial rebar availability on-the-fly as is the case with d3d where capframex reports full rebar. insofar full or to some extent controllable rebar could possibly also be achieved by vulkan, opengl, direct3d12 or simple modifications to the gpu driver, os, game or app.

rebar appears to be always available and always on in most cases for probably any kind of gpu which is d3d & ogl capable yet the user is being prohibited from controlling the size and also the system software is prevented from negotiating an optimal size that should be presented for each application and changed dynamically and automatically.

@terminatorul

bump (added "right to rebar" section above)

@terminatorul

strangely the gpuz advanced tab changed and

gpu hardware support went from
"no" -> "yes"

resizeable bar enabled in bios went from
"unsupported gpu" to "no"

graphics driver support went from
"unsupported gpu" to "yes"

i dont know how this happend because i didnt change anything except install a new driver but this seems really interesting and like something was changed by nvidia and / or gpuz.

after:
gpuz3

before:
gpuz4

so currently i am investigating a possibility to chain- / side-load the dxe uefi rebar module driver with a grub 2 compatible application such as ventoy and it seems theoretically possible indeed:

https://superuser.com/questions/1782009/can-grub-access-an-nvme-drive-when-the-bios-lacks-support-for-it

[..] in the case of UEFI, it is possible to sideload a DXE driver that is compatible [..] (with EDK2 shell or Clover whatsoever), then your "(transiently) patched" UEFI firmware can load grub [..]

https://www.globalspec.com/reference/54257/203279/chapter-12-differences-between-dxe-drivers-and-efi-drivers

The Extensible Firmware Interface (EFI) can be implemented in many ways. One way is to implement via the Driver Execution Environment (DXE) Foundation portion of the Framework. As such, DXE represents a special type of driver that can be combined with EFI drivers in a given firmware volume.

There are two basic classes of DXE drivers. The first class is DXE drivers that execute very early in the DXE phase. The execution order of these DXE drivers depends on the evaluation of dependency expressions. These early DXE drivers typically contain basic services, processor initialization code, chipset initialization code, and platform initialization code. These early drivers also typically produce the architectural protocols that are required for the DXE Core to produces its full complement of EFI Boot Services and EFI Runtime Services. In order to support that fastest possible boot time, as much initialization should be deferred to the DXE drivers that follow EFI Driver Model described in the EFI 1.10 Specification. Most of the platform and chipset drivers belong to this category. These drivers need to be aware that not all of these services may be available when they execute and use dependency expressions to make sure the protocols and services that they need are available.

https://stackoverflow.com/questions/63400839/how-to-set-dxe-drivers-loading-sequence

As far as i know DXE dispatcher first loads the driver that specifed in Apriori file. Then loads other considering dependencies.

https://stackoverflow.com/questions/68696868/do-uefi-dxe-drivers-operate-in-real-mode-what-about-ring-2-or-ring-3-code

UEFI drivers can run in either protected mode or long mode, depending on their images. You should know that EFI images are Portable Executables. They have the same file format to EXEs, DLLs, SYSes, etc.. This means UEFI driver can run in protected mode if it a PE32 image. It can run in long mode if it is a PE32+ image.
You should understand there are two kinds of memories defined in UEFI: boot-service memory and runtime memory. OS loader can destroy boot-service memory once it has completed its initialization (i.e.: it no longer needs any boot services provided by the firmware). Runtime memories, however, will be preserved. No software other than runtime drivers are supposed to have access to the runtime memories, although the processor does not prevent other software from tampering them.
Anyway, in a word, codes that reside in runtime memories will run in Ring 0. OS loaders can enumerate the address map in order to know where are runtime memories so that OS will not touch them.

https://tianocore-docs.github.io/edk2-ModuleWriteGuide/draft/8_dxe_drivers_non-uefi_drivers/88_dxe_runtime_driver.html

3.1 What is an EDK II module?
An EDK II module consists of source files or binary files and a module definition file (INF file). An INF file describes a module's basic information and interfaces such as consumed/produced library class/PCD/Protocol/Ppi/Guid. (Please refer to the EDK II Extended INF Specification

A typical EDK II module is a firmware component that is built, put in an FFS file and then put into a FV image. The component may be:

A driver or application which is built to *.efi binary file and put into FFS file as EFI_PE_SECTION:

Figure 2 Firmware Volume
Raw data binary. For example, $(WORKSPACE)\MdeModulePkg\Logo\Logo.inf is a raw binary module which contains logo bitmap image.

An option ROM driver that is put into a device's option ROM.

A standalone UEFI driver

A standalone UEFI application.

A library instance that is built to a library object file (.lib) and statically linked to another module.

[..]

8.8 DXE Runtime Driver
A DXE runtime driver executes in both boot services and runtime services environments. This means the services that these modules produce are available before and after ExitBootServices() is called, including the time that an operating system is running. If SetVirtualAddressMap() is called, then modules of this type are relocated according to virtual address map provided by the operating system.

The DXE Foundation is considered a boot service component, so the DXE Foundation is also released when ExitBootServices() is called. As a result, runtime drivers may not use any of the UEFI Boot Services, DXE Services, or services produced by boot service drivers after ExitBootServices() is called.

A DXE runtime driver defines MODULE_TYPE as DXE_RUNTIME_DRIVER in the INF file. In addition, because the DXE runtime driver encounters SetVirtualAddressMap() during its life cycle, it may need to register an event handler for the event EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE.

I can imagine it is possible (but challenging!) to tell the GPU device to change BAR size after POST, from GRUB.

But this would also change the base address of the BAR. And then the GPU driver in the firmware is no longer in sync with the hardware. The firmware driver tries using the old address, and the hardware expects the new address.

Here the GPU driver is part of the system firmware, and is called GOP (Graphics Output Protocol). Windows has its own driver, and admittedly Windows would work even if the GOP does not. But only after boot. There is no boot progress, and if there is a boot problem and Windows can not load, or if you use a boot menu like GRUB, the screen would be frozen or corrupted.

In short, to change BAR size after POST means a new GOP driver should be provided as well. That would be a project in itself.

Then there is the issue of allocating the new base address, so that the new BAR does not conflict with any addresses already allocated for other devices in the system.

About GPU-Z, the maintainer added support for possible ReBAR on RTX2000 some time after NvStrapsReBar project showed up on github. I even provided some screenshots to them, after installing the pre-release (engineering) version of GPU-Z at the time.