rust-osdev/bootloader

better handling of very large memory maps

hawkw opened this issue · 4 comments

hawkw commented

Currently, the bootloader has some issues handling BIOS memory maps that contain very high addresses, such as around the 1TB mark, and/or very large regions (in excess of 500 GB). Depending on the bootloader version and configuration, memory maps with high addresses or large regions may result in crashes or degrade boot performance significantly.

In particular, the following issues have been observed:

  • v0.9.x of bootloader contains an assertion that may be triggered when the memory map contains an address too big for a single PML4 entry if the map_physical_memory feature flag is enabled:

    bootloader/src/main.rs

    Lines 289 to 291 in 42da77b

    // If offset not manually provided, find a free p4 entry and map memory here.
    // One level 4 entry spans 2^48/512 bytes (over 500gib) so this should suffice.
    assert!(max_phys_addr < (1 << 48) / 512);
    (see also phil-opp/blog_os#1138)
  • v0.10.x of bootloader doesn't contain that assertion, but does have offset selection code that assumes no region will require multiple PML4 entries:
    let base =
    Page::from_page_table_indices_1gib(self.get_free_entry(), PageTableIndex::new(0))
    .start_address();
  • v0.10.x of bootloader exhibits very long boot times when a memory map contains a reserved region at a ~1TB offset. This is because the bootloader will identity map all pages up to the highest reserved address in the memory map using 4K pages, and does this twice (once for the bootloader itself, and a second time when setting up physical memory mappings for the kernel). This results in a very slow boot process.

These issues are relevant because AMD systems with an IOMMU have a reserved hole close to the 1TB mark. In recent QEMU versions (>= v7.1.0), QEMU will report this region as reserved in its e380 BIOS memory map, resulting in assertion failures or boot performance degradation. I would assume this would cause issues when booting on real AMD hardware, as well.

Some changes that would improve how bootloader handles high addresses in the memory map (many of which were suggested by @phil-opp on Gitter):

  • Change the automatic offset selection to support regions that need multiple entries in the level 4 page table (v0.9.x, v0.10.x, and probably v0.11.x)
  • Don't identity map reserved regions at all when performing the initial identity mapping for the bootloader itself (v0.10.x and probably v0.11.x)
    • The framebuffer would still need to be identity mapped so that the bootloader can write to it.
  • Use 2MB rather than 4KB pages when identity mapping the kernel address space, which would improve performance (v0.10.x and v0.11.x)
  • Consider not identity mapping holes in the memory map at all, only the reserved regions that would need to be mapped for the kernel. This way, we wouldn't map every page between the second-highest reserved region and the 1TB hole on AMD systems.
  • Consider not identity mapping that specific reserved region, at all. It's an unusable hole, not a MMIO region or BIOS structure that the kernel would actually want to access...
hawkw commented

(as a side note, I'd be happy to work on some or all of these changes)

Thanks a lot for creating this issue!

I'm a bit short on time right now, so I would appreciate any help! I would suggest that we fix things in the following order:

  1. Change the automatic offset selection to support regions that need multiple entries in the level 4 page table (in v0.9 and v0.10) and remove the related assertion (in v0.9). This should be enough to fix the boot errors.
  2. In the BIOS identity mapping of v0.10, ignore all reserved regions with start address > 4GIB when calculating max_phys_addr. This should be possible by adding an additional filter call here (before the map):
    let max_phys_addr = e820_memory_map
    .iter()
    .map(|r| r.start_addr + r.len)
    .max()
    .expect("no physical memory regions found");

    The reason for the 4GiB bound is that this is the maximum addressable memory in protected mode, so everything important should be contained there (e.g. the framebuffer).
  3. For the kernel-requested physical memory mapping we don't want to skip regions. However, instead of mapping the full 0..max_phys_addr range, we could instead iterate over the reported regions and map only them. The relevant part of the code is:
    for frame in PhysFrame::range_inclusive(start_frame, end_frame) {

    This could instead become something like the following (pseudo-code):
    for region in frame_allocator.regions() {
        for frame in PhysFrame::range_inclusive(region.start_frame(), region.end_frame()) { ...
    We could still use 2MiB pages for these mappings and skip any pages that are mapped already.

The code for version v0.10 is in the main branch and the code for v0.9 is in v0.9-base. We can ignore the upcoming v0.11 release for now, I'll cherry-pick the relevant parts later.


Regarding:

Use 2MB rather than 4KB pages when identity mapping the kernel address space, which would improve performance (v0.10.x and v0.11.x)

I think this is already happening. I was just confused because we don't specify the page size for the start frame here:

let start_frame = PhysFrame::containing_address(PhysAddr::new(0));
let max_phys = frame_allocator.max_phys_addr();
let end_frame: PhysFrame<Size2MiB> = PhysFrame::containing_address(max_phys - 1u64);

However, we do specify the size for the end frame. Since start and end frame must be of the same type, type inference should automatically choose Size2MiB for the start frame as well. It would still be a good idea to make this more explicit :D.

hawkw commented

Opened #260 to ignore physical addresses in excess of 4 GiB when identity mapping for the bootloader itself.

* [ ]  Change the automatic offset selection to support regions that need multiple entries in the level 4 page table (v0.9.x, v0.10.x, and probably v0.11.x)

I'm working on implementing this. This will also fix #262.