better handling of very large memory maps
hawkw opened this issue · 4 comments
Currently, the bootloader has some issues handling BIOS memory maps that contain very high addresses, such as around the 1TB mark, and/or very large regions (in excess of 500 GB). Depending on the bootloader
version and configuration, memory maps with high addresses or large regions may result in crashes or degrade boot performance significantly.
In particular, the following issues have been observed:
- v0.9.x of
bootloader
contains an assertion that may be triggered when the memory map contains an address too big for a single PML4 entry if themap_physical_memory
feature flag is enabled:Lines 289 to 291 in 42da77b
- v0.10.x of
bootloader
doesn't contain that assertion, but does have offset selection code that assumes no region will require multiple PML4 entries:bootloader/src/binary/level_4_entries.rs
Lines 172 to 174 in ac46d04
- v0.10.x of
bootloader
exhibits very long boot times when a memory map contains a reserved region at a ~1TB offset. This is because the bootloader will identity map all pages up to the highest reserved address in the memory map using 4K pages, and does this twice (once for the bootloader itself, and a second time when setting up physical memory mappings for the kernel). This results in a very slow boot process.
These issues are relevant because AMD systems with an IOMMU have a reserved hole close to the 1TB mark. In recent QEMU versions (>= v7.1.0), QEMU will report this region as reserved in its e380
BIOS memory map, resulting in assertion failures or boot performance degradation. I would assume this would cause issues when booting on real AMD hardware, as well.
Some changes that would improve how bootloader
handles high addresses in the memory map (many of which were suggested by @phil-opp on Gitter):
- Change the automatic offset selection to support regions that need multiple entries in the level 4 page table (v0.9.x, v0.10.x, and probably v0.11.x)
- Don't identity map reserved regions at all when performing the initial identity mapping for the bootloader itself (v0.10.x and probably v0.11.x)
- The framebuffer would still need to be identity mapped so that the bootloader can write to it.
- Use 2MB rather than 4KB pages when identity mapping the kernel address space, which would improve performance (v0.10.x and v0.11.x)
- Consider not identity mapping holes in the memory map at all, only the reserved regions that would need to be mapped for the kernel. This way, we wouldn't map every page between the second-highest reserved region and the 1TB hole on AMD systems.
- Consider not identity mapping that specific reserved region, at all. It's an unusable hole, not a MMIO region or BIOS structure that the kernel would actually want to access...
(as a side note, I'd be happy to work on some or all of these changes)
Thanks a lot for creating this issue!
I'm a bit short on time right now, so I would appreciate any help! I would suggest that we fix things in the following order:
- Change the automatic offset selection to support regions that need multiple entries in the level 4 page table (in v0.9 and v0.10) and remove the related assertion (in v0.9). This should be enough to fix the boot errors.
- In the BIOS identity mapping of v0.10, ignore all reserved regions with start address > 4GIB when calculating
max_phys_addr
. This should be possible by adding an additionalfilter
call here (before themap
):
Lines 86 to 90 in ac46d04
The reason for the 4GiB bound is that this is the maximum addressable memory in protected mode, so everything important should be contained there (e.g. the framebuffer). - For the kernel-requested physical memory mapping we don't want to skip regions. However, instead of mapping the full
0..max_phys_addr
range, we could instead iterate over the reported regions and map only them. The relevant part of the code is:
Line 238 in ac46d04
This could instead become something like the following (pseudo-code):We could still use 2MiB pages for these mappings and skip any pages that are mapped already.for region in frame_allocator.regions() { for frame in PhysFrame::range_inclusive(region.start_frame(), region.end_frame()) { ...
The code for version v0.10
is in the main
branch and the code for v0.9
is in v0.9-base
. We can ignore the upcoming v0.11
release for now, I'll cherry-pick the relevant parts later.
Regarding:
Use 2MB rather than 4KB pages when identity mapping the kernel address space, which would improve performance (v0.10.x and v0.11.x)
I think this is already happening. I was just confused because we don't specify the page size for the start frame here:
Lines 227 to 229 in ac46d04
However, we do specify the size for the end frame. Since start and end frame must be of the same type, type inference should automatically choose
Size2MiB
for the start frame as well. It would still be a good idea to make this more explicit :D.