Aarch64 struct init causes error interrupt in bare metal kernel

Question

Aarch64 struct init causes error interrupt in bare metal kernel

Closed this issue 2 years ago · 7 comments

Zig Version

0.9.1

Steps to Reproduce

In order to reproduce the issue, some kind of bare metal aarch64 environment is required. In my case, it's a qemu aarch64 elf bootable compiled and linked with the zig builder. Here is my repository with more context.

The issue is tricky, and I'm still trying to find a constant pattern. It's not really reproducible but instead only occurs in certain scenarios. The more general pattern seems to be that as soon as "the" struct gets initiated, somewhere within the initiation, the CPU jumps to the interrupt (0x200) and loops bc no error handler is setup yet.

If I compile the kernel bootable directly in DebugMode, the cpu also jumps immediately to 0x200 no matter the code in kernel_main which is really interesting I guess.

Scenario1:
This scenario is very constant in its behavior, but difficult to reproduce I guess. This scenario depends on an MMIO write qemu_cfg_write_entry(&ramfb_cfg, select, @sizeOf(qemu_dma.QemuRAMFBCfg)); in ramfb_setup. There the cpu branches to 0x200 in the ramfb_cfg init. Always at the second parameter fourcc.
If it does branch, depends on multiple factors.
It branches to the interrupt, if:

qemu_cfg_write_entry is called afterwards
the last n elements of ramfb_cfg are anything else then 0 (even undefined does cause a branch)
still branches if .fourcc is 0 (and still at exactly that point...)

It does not branch if:

qemu_cfg_write_entry is not called
(edit: edit O got this the wrong way around) the last n elements are 0
if instead of a pointer to the struct, a pointer to a local int (of any size) is passed to qemu_cfg_write_entry

if I init the struct in the main function and then pass the pointer to ramfb_setup() it still branches at the .fourcc in the main function.

The code that runs before that does not have any influence on the issue (I still included it for context).
Sadly there is no minimum reproducible example, except for my repository (but that is really simple to setup on any aarch64 system and all the setup is contained within the build.zig)

pub const QemuRAMFBCfg = packed struct {
    addr: u64,
    fourcc: u32,
    flags: u32,
    width: u32,
    height: u32,
    stride: u32,
};

xport fn kernel_main() callconv(.Naked) noreturn {
    // get address of external linker script variable which marks stack-top and heap-start
    const heap_start: *anyopaque = @as(*anyopaque, @extern(?*u8, .{ .name = "_stack_top" }) orelse {
        serial.kprintf("error reading _stack_top label\n", .{}) catch unreachable;
        unreachable;
    });

    var allocator = WaterMarkAllocator.init(heap_start, 5000000);

    ramFb.ramfb_setup(&allocator) catch |err| {
        serial.kprintf("error while setting up ramfb: {u} \n", .{@errorToInt(err)}) catch unreachable;
    };

    while (true) {}
}


pub fn ramfb_setup(allocator: *WaterMarkAllocator) !void {
    const select = qemu_dma.qemu_cfg_find_file() orelse return RamFbError.RamfbFileNotFound;
    serial.kprintf("before  malloc \n", .{}) catch unreachable;
    serial.kprintf("after init \n", .{}) catch unreachable;

    var fb = allocator.malloc(fb_size) catch unreachable;
    var ramfb_cfg = qemu_dma.QemuRAMFBCfg{
        .addr = @byteSwap(u64, @ptrToInt(fb)),
        .fourcc = @byteSwap(u32, drm_format_xrgb8888),
        .flags = 0,
        .width = @byteSwap(u32, fb_width),
        .height = @byteSwap(u32, fb_height),
        .stride = @byteSwap(u32, fb_stride),
    };

    // var i_replacing_struct: u64 = 4;
    serial.kprintf("ramfb_cfg: {u}, {u}, {u}, {u}, {u}, {u}", .{ ramfb_cfg.addr, ramfb_cfg.fourcc, ramfb_cfg.flags, ramfb_cfg.width, ramfb_cfg.height, ramfb_cfg.stride }) catch unreachable;
    qemu_dma.qemu_cfg_write_entry(&ramfb_cfg, select, @sizeOf(qemu_dma.QemuRAMFBCfg));
    serial.kprintf("after write \n", .{}) catch unreachable;
}

fn qemu_cfg_dma_transfer(addr: u64, len: u32, control: u32) void {
    dma_acc = .{ .control = @byteSwap(u32, control), .len = @byteSwap(u32, len), .address = @byteSwap(u64, addr) };
    barrier();
    // writing to most significant with offset 0 since it's aarch*64*
    const base_addr_upper = @intToPtr(*u64, qemu_cfg_dma_base_dma_addr);
    base_addr_upper.* = @byteSwap(u64, @ptrToInt(&dma_acc));

    // rather ugly cast to volatile with off alignment (because of packed struct) required
    const dma_acc_ctrl_check = @ptrCast(*align(1) volatile u32, &dma_acc.control);
    while ((@byteSwap(u32, dma_acc_ctrl_check.*) & ~@intCast(u8, qemu_cfg_dma_ctl_error)) != 0) {}
}

pub fn qemu_cfg_write_entry(buff: *anyopaque, e: u32, len: u32) void {
    var control: u32 = (e << 16) | @enumToInt(QemuCfgDmaControlBits.qemu_cfg_dma_ctl_select) | @enumToInt(QemuCfgDmaControlBits.qemu_cfg_dma_ctl_write);
    qemu_cfg_dma_transfer(@ptrToInt(buff), len, control);
}

Scenarion2:

The second scenario depends on whether the function kprint_ui contains another function, which in turn returns a (anonymous [does not make a difference wether it's anonymous or ]) struct. If it does contain that other function(in this case uito), the cpu branches to 0x200 somewhere within, or more regularly at the return struct init of uitoa.
If I'm calling kprint_ui_full (where the uitoa fn is not called but instead pasted [not inlined!; if I do that, it still crashes {I guess bc the struct is still inited}]) it does run properly and the kernel ends up in the loop.

export fn kernel_main() callconv(.Naked) noreturn {
    // serial.kprint_ui_full(100, utils.PrintStyle.string);
    serial.kprint_ui(100, utils.PrintStyle.string);
}

pub fn kprint_ui(num: u64, print_style: utils.PrintStyle) void {
    var ret = utils.uitoa(num, print_style);
    var j: usize = 0;
    while (j < ret.len) : (j += 1) {
        put_char(ret.arr[j]);
    }
}

pub fn kprint_ui_full(num: u64, print_style: utils.PrintStyle) void {
    var str = [_]u8{0} ** 20;

    if (num == 0) {
        str[0] = 0;
        return;
    }

    var rem: u64 = 0;
    var i: u8 = 0;
    var num_i = num;
    while (num_i != 0) {
        rem = @mod(num_i, @enumToInt(print_style));
        if (rem > 9) {
            str[i] = @truncate(u8, (rem - 10) + 'a');
        } else {
            str[i] = @truncate(u8, rem + '0');
        }
        i += 1;

        num_i = num_i / @enumToInt(print_style);
    }
    utils.reverse_string(&str, i);

    var j: usize = 0;
    while (j < i) : (j += 1) {
        put_char(str[j]);
    }
}

// 20 is u64 max len in u8
pub fn uitoa(num: u64, print_style: PrintStyle) struct { arr: [20]u8, len: u8 } {
    var str = [_]u8{0} ** 20;

    if (num == 0) {
        str[0] = 0;
        return .{ .arr = str, .len = 0 };
    }

    var rem: u64 = 0;
    var i: u8 = 0;
    var num_i = num;
    while (num_i != 0) {
        rem = @mod(num_i, @enumToInt(print_style));
        if (rem > 9) {
            str[i] = @truncate(u8, (rem - 10) + 'a');
        } else {
            str[i] = @truncate(u8, rem + '0');
        }
        i += 1;

        num_i = num_i / @enumToInt(print_style);
    }
    reverse_string(&str, i);
    return .{ .arr = str, .len = i };
}

(Edit)Scenarion3:

I tried to find a workaround and found that it does work but the workaround is really weird and I can't make any sense of it.
My approach to the workaround was to remove as much as excess logic as possible which in this case was the function layout lol.
Afterwards I just removed random bits and checked wether it would work. The result is not too different except for that the barrier() function is removed and all the function layouts are removed and the code is pasted(again an inline does not work). Also just removing the barrier function and keeping the function layout does also not work. It's really weird.

(the code below works and no branch occurs)

// this is the same barrier function as used in the examples above
pub fn barrier() void {
    asm volatile ("ISB");
}

pub fn ramfb_setup(allocator: *WaterMarkAllocator) !void {
    const select = qemu_dma.qemu_cfg_find_file() orelse return RamFbError.RamfbFileNotFound;
    serial.kprintf("before  malloc \n", .{}) catch unreachable;
    serial.kprintf("after init \n", .{}) catch unreachable;

    var fb = allocator.malloc(fb_size) catch unreachable;
    var ramfb_cfg = qemu_dma.QemuRAMFBCfg{
        .addr = @byteSwap(u64, @ptrToInt(fb)),
        .fourcc = @byteSwap(u32, drm_format_xrgb8888),
        .flags = 0,
        .width = @byteSwap(u32, fb_width),
        .height = @byteSwap(u32, fb_height),
        .stride = @byteSwap(u32, fb_stride),
    };

    var control: u32 = (select << 16) | @enumToInt(qemu_dma.QemuCfgDmaControlBits.qemu_cfg_dma_ctl_select) | @enumToInt(qemu_dma.QemuCfgDmaControlBits.qemu_cfg_dma_ctl_write);

    var dma_acc = .{ .control = @byteSwap(u32, control), .len = @byteSwap(u32, @sizeOf(qemu_dma.QemuRAMFBCfg)), .address = @byteSwap(u64, @ptrToInt(&ramfb_cfg)) };
    // qemu_dma.barrier();
    // writing to most significant with offset 0 since it's aarch*64*
    const base_addr_upper = @intToPtr(*u64, 0x9020000 + 16);
    base_addr_upper.* = @byteSwap(u64, @ptrToInt(&dma_acc));

    // rather ugly cast to volatile with off alignment (because of packed struct) required
    const dma_acc_ctrl_check = @ptrCast(*align(1) volatile u32, &dma_acc.control);
    while ((@byteSwap(u32, dma_acc_ctrl_check.*) & ~@intCast(u8, 0x01)) != 0) {}

    serial.kprintf("done \n", .{}) catch unreachable;

    // // var i_replacing_struct: u64 = 4;
    // serial.kprintf("ramfb_cfg: {u}, {u}, {u}, {u}, {u}, {u}", .{ ramfb_cfg.addr, ramfb_cfg.fourcc, ramfb_cfg.flags, ramfb_cfg.width, ramfb_cfg.height, ramfb_cfg.stride }) catch unreachable;
    // qemu_dma.qemu_cfg_write_entry(&ramfb_cfg, select, @sizeOf(qemu_dma.QemuRAMFBCfg));
    // serial.kprintf("after write \n", .{}) catch unreachable;
}

Expected Behavior

Not branch to 0x200 and instead continue to the kernel_main loop.

Actual Behavior

As already mentioned, it branches to an "error" interrupt handler, if I dereference the interrupt handler it returns the opcode 16777216 but I can't find any information on what that means.

(gdb) p *0x0000000000000200
$1 = 16777216

Answer 1 · 2022-06-13T17:46:28.000Z

Hmm, is 0x200 the address for one of the SP_EL1/2/3 exception level handlers?

Also, 16777216 is 0x1000000, which somehow /feels/ relevant. Sadly my aarch64 knowledge is lacking

Have you tried actually implementing some of the exception handlers?

Answer 2 · 2022-06-14T00:40:42.000Z

Not yet but now I'm planning to :D

Answer 3 · 2022-06-17T10:56:13.000Z

So, I have not started to implement the interrupt handler yet but I now have more information about the issue.
Since I opened this issue, I rewrote the project in C and am confident that neither the mmio nor the zig code(it's not objectively broken) are the issue.

I also did some more thorough debugging, and I guess(-I'm not a 100% certain or anything-), structs really are broken in this specific context, since I can now cancel out the mmio as cause.

It really shows if the kernel is compiled in Debug mode, since it then causes an interrupt as soon as the first struct gets initiated (and returned).
actual code:

// in kernel.zig
export fn kernel_main() callconv(.Naked) noreturn {
    // get address of external linker script variable which marks stack-top and heap-start
    const heap_start: *anyopaque = @as(*anyopaque, @extern(?*u8, .{ .name = "_stack_top" }) orelse {
        serial.kprintf("error reading _stack_top label\n", .{}) catch unreachable;
        unreachable;
    });

    var allocator = WaterMarkAllocator.init(heap_start, 5000000);
....
}

pub const WaterMarkAllocator = struct {
    const max_frees = 100;
    alloc_base: [*]u8,

    alloc_bottom: usize,
    alloc_top: usize,

    currently_free: usize,
    // todo => dynamic
    freed_zones: [max_frees]struct { freed_base: *anyopaque, freed_size: usize },
...
    pub fn malloc(self: *WaterMarkAllocator, size: usize) !*anyopaque {
        if (self.alloc_bottom + size > self.alloc_top) {
            return AllocationError.MaxMem;
        }
        self.alloc_bottom += size;
        return @ptrCast(*anyopaque, self.alloc_base + self.alloc_bottom);
    }
....
    pub fn init(base: *anyopaque, alloc_size: usize) WaterMarkAllocator {
        return .{ .alloc_base = @ptrCast([*]u8, base), .alloc_bottom = 0, .alloc_top = alloc_size, .currently_free = 0, .freed_zones = undefined };
    }
};

gdb log:

36	        return .{ .alloc_base = @ptrCast([*]u8, base), .alloc_bottom = 0, .alloc_top = alloc_size, .currently_free = 0, .freed_zones = undefined };
(gdb)
0x00000000400006b8	36	        return .{ .alloc_base = @ptrCast([*]u8, base), .alloc_bottom = 0, .alloc_top = alloc_size, .currently_free = 0, .freed_zones = undefined };
(gdb)
0x00000000400017b8 in memset ()
(gdb)
0x0000000000000200 in ?? ()

In ReleaseSmall mode (which I used in the initial issue description), this issue does not seem to occur as immediate but as described above, it somewhat goes into the same direction, but later, or maybe messes with the stack.
To me it seems like a miss compile since most of the scenarios (described in the initial issue description) depend on code that is executed "in tue future". Meaning that it jumps to the interrupt handler based on wether a certain function is contained within the code (which would have been executed after the line at which it crashed and had seemingly nothing todo with it).

Answer 4 · 2022-06-19T09:44:41.000Z

So, after experimenting with it, there's a few things which may be interesting.
I wanted to get the kernel to run with a Debug build which, as already mentioned, did not work bc in Debug, it crashed at the struct return of the first called (allocator init) function.
I fixed that by removing the undefined element, next was an issue with a potential out of bounce (which wasn't one but instead weird pointer arithmetics; anyhow I fixed it by lowering the bound).
Now surprisingly, that ran without any issues. So I thought, maybe it would now work in ReleaseSmall, but that wasn't the case. The ReleaseSmall behaviour was (completely) unchanged with the same crash path and behaviour.
Well back to Debug, that did run, but for some reason I can't do stack array allocations anymore (which I can do in ReleaseSmall) and the mmio writes pass but are not really written (at least that's what the gdb reads say and also the driver does not what it's supposed to do....)

Answer 5 · 2022-07-26T09:35:31.000Z

So after a short break I decided to implement the interrupt handler. I did that in Zig as well (which in hindsight wasn't the smartest of all things to do but it works. I guess bc an interrupt handler is pretty simple at the end. No anonymous struct, no dma, no nested functions, no memory quirks,... just a few exported functions).
The gicv3(arm generic interrupt controller) implementation can be found here but is also in the ZigKernel project.
As already mentioned, I'm pretty new to embedded things so I can't really claim that my implementation is perfect nor correct(so use it with a spoon of salt, I guess. Although I have to say that the implementation has been extremely consistent and aligns with what I expected in my tests with other test interrupt types).

The interrupt created by this bug(this issues bug..) is signaled via el1_sync_irq. The only register(that I found!...) that was really worth while looking into, is the esr(exception symptom reg) which holds all kinds of information about an exception. Among that is the exception class, which I thought could be of value. In this case the esr EC is Access to SVE, Advanced SIMD or floating-point functionality trapped by CPACR_EL1.FPEN, CPTR_EL2.FPEN, CPTR_EL2.TFP, or CPTR_EL3.TFP control. = (0b000111). I don't really know that how that fits in the bigger picture but it's something I guess.

An exception occur:
elr: 1073753312, esr: 534773760
(x0:1073753756, x1:64, x2:2, x3:0, x4:0, x5:0, x6:0, x7:0, x8:150994944, x9:111, x10:117, x11:110, x12:100, x13:1073752152, x14:1073741824, x15:0, x16:0, x17:0, x18:0, x19:102, x20:32, x21:1, x22:0, x23:114, x24:0, x25:97, x26:0, x27:0, x28:0, x29:0, )
32 bit instruction trapped
Exception Class(from esp reg): sveAsmidFpAcc

Edit: So after playing around with it and being able to "properly" debug (bc of the interrupt handler), I can now reproduce a(the?) issue consistently (still only in ReleaseSmall, all other build modes don't even(or only partially) run (without throwing exceptions).
Apart from not being able to use struct returns which I could kind of evade by not using them, this time it's the volatile keyword in combination with a dma write. To be more precise, I'm creating a struct which's address I'm then writing to a mmio address. Next I'm looping over a member of the struct(for which I cast the pointer to volatile). If I do so the exception from above is thrown, if I don't declare it as volatile it's not(but then the struct member is not read correctly...)(also if I'm not passing the structs addr to the mmio, the issue is not happening).
I don't think it makes any more sense to continuing this issue (on my side) since all of the symptoms are somewhat specific and I can't really tell wether it's my code(unlikely though as it runs fine in C) or zig.

Answer 6 · 2022-08-02T09:16:38.000Z

so this is interesting. The interrupt exception kind of confused me (Access to SVE, Advanced SIMD or...) bc I did not do anything with fp or SMID. But the Zig compiler is still kind of a black box to me...
But hey, I solved the problem, I did not enable FP/ SMID operation in the boot asm. When I enabled it (not intentionally, not concerning this issue anyways, just wanted to properly configure the cpu lol)

mov x0, #3 << 20
msr cpacr_el1, x0

most of the issues above were resolved and my code ran just fine.
I thought that the FP/SMID registers where touched unintentionally(bc of a miss compile or mem issue on my side) but apparently there is a need for such operations... (issues with structs still remain but had nothing to do with the last 3(including this) updated on the issue)
Cheers

Answer 7 · 2023-10-10T16:00:09.000Z

Dropping a comment here for future generations... I ran into exactly the same problem.

It's kind of confusing that assigning a struct requires SIMD instructions to be allowed, exception that it happens because struct copies are done using the Q registers... these are intended for use with FP/SIMD operations but are also useful because they allow a lot of bits to be moved in a single instruction.

That instruction just happens to be one that traps as access to SVE or advanced SIMD.