adam-mcdaniel/oakc

MAR backend

Opened this issue · 15 comments

Hi,

Thank you for this amazing project. I have been trying to write a C compiler for Much Assembly Required for a while but never seem to get past the beast called the C standard.
This project allows me to focus on finishing a source to source translation without getting stuck on the details.
You can see my attempt at implementing the backend here: https://github.com/kevinramharak/oakc/tree/mar/master

I do have a few questions tough:

  • Right now the backend is implemented in the same structure as c and go. I implemented most of it by using the C implementation as a reference. Are there any plans to document how a backend is implemented? For example std.ok is very opaque at the moment. Maybe i could even help write some of the docs?
  • Is it possible to provide more debug information to the Target trait? Debugging the generated assembly with 0 information from the original context is hard. I understand the HIR and MIR alllow for these abstractions, but preserving function names to the fn_definition would already be a step in the right direction.
  • It is a bit unclear what the expected result is for the examples. For example refer.ok
type Counter(1) {
    fn new() -> Counter { 0 }

    fn count(self: &Counter) -> &num { self }

    fn increment(self: &Counter) -> void {
        self->count = self->count + 1;
    }

    fn decrement(self: &Counter) -> void {
        self->count = self->count - 1;
    }
}

fn inc(c: Counter) -> void {
    c.increment();
    c.increment();
    c.increment();
    putnumln(c->count);
}

fn main() -> void {
    let c: Counter = Counter::new();
    putnumln(c->count);
    inc(c);
    putnumln(c->count);
}

prints out:

0 // expected 0
3 // expected 3
0 // expected 3

Is it intended that inc has copy behaviour?

If you have spare time I would appreciate it if you could look review my rust implementation as the compiler does not like the way i generate unique labels for while loops. Any nudges in the right direction would be appreciated.

Hello. Thank you so much for exploring my compiler, it means a lot to me that someone would want to use it for a personal project. Let me address your questions one by one.

Right now the backend is implemented in the same structure as c and go. I implemented most of it by using the C implementation as a reference. Are there any plans to document how a backend is implemented? For example std.ok is very opaque at the moment. Maybe i could even help write some of the docs?

Yes, I plan to add a README for each of the individual folders in the project for contributors' sake, especially the target folder. Having the ability to choose from several backends is a huge part of this language, and I want to strengthen that ability. If you want to contribute to the project in any way, I would absolutely love that too :)

Is it possible to provide more debug information to the Target trait? Debugging the generated assembly with 0 information from the original context is hard. I understand the HIR and MIR alllow for these abstractions, but preserving function names to the fn_definition would already be a step in the right direction.

Yes, adding debug information could be done. Thankfully, function names are only replaced with ID's in the final and most simple layer of IR in asm.rs. This would be an easy fix, but I think it should be done right. The reason I use an i32 to name a function is to prevent colliding with identifiers in the target language, such as printf or something. A prefix could be added to the function name to prevent collisions, but I also worry about outputting valid identifiers in target languages. I know it's definitely not a problem somebody will likely run into ever, but I can't help but feel like its less sound than using an i32 ID.

I think the proper way is to feed the Target the function ID and the name of the function; the target could use the ID like before, and the function name could be placed in a comment before the definition. There are many places in the code where information like this could be added to the target output without much work.

It is a bit unclear what the expected result is for the examples. For example refer.ok

You're absolutely right. When I wrote the examples, I was mostly trying to debug the compiler after adding some feature. I think it would be helpful to add a header comment for each file explaining its purpose, and more comments in the code.

If you have spare time I would appreciate it if you could look review my rust implementation as the compiler does not like the way i generate unique labels for while loops. Any nudges in the right direction would be appreciated.

I would love to! I couldn't seem to find a branch with a Rust target implementation on your github, though. Have you pushed it yet?

Thanks for the response. Im still finishing the implementation to work for all examples after that i will explore the architecture for the Target trait to see if I can start a draft.

Yeah the debug information should not be used by the backend to generate code. Just to provide some debug information in the result so its easier to figure out how the generated code maps to the original source. I think its good practise to use a unqiue id.

I made a fork of oakc and made a branch MAR-Backend. You can visit it here https://github.com/kevinramharak/oakc/tree/mar/master/src/target/mar.rs. It has a std.mar file in the same folder which implements the virtual machine and std.ok functions in mar assembly.
Right now theres some extra files and folders i generate for testing, but those will be removed once its all working.

That's absolutely wonderful. I'm really excited to see how well your new Target implementation goes. If you have any questions, I can definitely help. How is it working so far? Are there any problems?

I have every example working except for bf.ok. I have not investigated it yet as debugging the generated assembly is pretty hard. It seems to execute it until the end and it prints a correct dump of the tape. It just does not write the "Hello World!" output and the tape values are completely different from when i run the binary so something does not add up. I'm also not 100% sure about my allocate implementation. I'm going to write some tests to figure out if it works correctly.

The current live version has a broken console so I can't link it yet, but I even got input.ok working on a local instance. It is really fun to see how easy it is to implement the std.ok functions in assembly and just 'link' it all together and it just works. Am looking forward to how the FFI is going to work. The Cubot you control with the assembly has some 'hardware' that can be interfaced with by using special instructions. So I can implement simple wrapper functions to execute these instructions with the vm stack as input and output and it just works. Also interested in your take on external symbols as this would allow for more powerful code reading directly from memory locations.

I am keeping up to date with upstream, so the new sign opcode should work soon. I am also planning to try and get a gh-pages branch running where I compile the compiler (or at least enough of it) to web assembly so people won't have to install rust just to take a look. So you could write your oak in a textarea, get your resulting assembly code and copy it into the editor on the MAR site. But that just something im planning on to try and get more people hooked by having a easy entrypoint to the project.

Simple screen shot of running file.ok:
https://files.slack.com/files-pri/T7UJ5SESC-F01811U8XRS/image.png

That's absolutely incredible. With the bf.ok file, do you think it's a problem with your Target implementation, or are you worried that there is a problem with the Oak compiler itself? I would write a test program something like the following to test your alloc implementation.

#[heap(512)]
fn main() {
    let size = 32;
    let x: &char = alloc(size);
    // This should be something around 500 something depending on
    // how many variables are used in the included `std.ok` functions.
    // The more variables defined in those functions, the larger the stack is,
    // and the further away from address 0 the heap begins on the memory tape.
    putnumln(x);
}

I found, using programs somewhat similar to this, that my alloc core instruction was not at fault, but that the stack was to blame. I've also found that measuring the initial stack size versus the final stack size is crucial for discovering errors like this. Oak programs should NEVER be able to compile output code with a net negative impact on the stack. So, if you can find a contradiction to this, you've found a compiler error or an error in the way your VM code manages your stack. Keep in mind the information in #14.

Also, with respect to the webassembly implementation, an alternative compile function needs to be written in lib.rs for that to be possible. Right now, HIR compiles include directives which read from files on the disk. A new source_compile function, for example, needs to be written which throws errors when using directives which read from files. This will be an incredibly simple issue to solve.

I'd also love to hear your perspective on #10 :)

Okay i got the bf.ok file working for 50% (it prints 'Hello ' and then stops executing :D). The problem was with how i generated labels for the while_begin and while_end. Now that i use a stack to keep track of nested while_begin's it seems to get a lot further. Probably still a bug somewhere, but im mostly running into MAR backend issues now so thats a different project.

Im going to put the webassembly branch off until I have at least the core functionality working. Tnx for the headers. I think I can hack a demo once I put a few hours in to it.

Since MAR has a fixed binary size of 128k (64k x 16-bit words) I have not implented the heap directive yet. Right now it dumps data at offset 0, code after data and then a label at the end of the code. This means everything between that label and the stack pointer (which starts at 2 ** 16 and goes down) is open for grabs. That does bring up an interesting point. What should an implementation do when it finds a compile time error like the requested heap size is to large? Right now every function returns a String, can i return an Result instead with an error?

Maybe we should add a fn max_mem_capacity(&self) -> Option<usize>; method to the Target trait, and throw an AsmError when assembling an AsmProgram. This would be really simple to add.

That seems like a good solution for now.

I can't believe every other example works though, thats absolutely incredible. I'm throroughly impressed that someone else was able to add another backend, which compiles to psuedo 8086 assembly, no less. I think that's solid proof that this project accomplishes what I've always wanted from the compilers I've previously written. I'm optimistic about where this project is going. I wonder if your MAR backend could be adapted to real 8086 assembly to run on DOS....

Well I think the setup of HIR, MIR and ASM makes it very easy to transpile the program to a different source. It seems like you can add features and syntax on the fly without being weighed down by the compiler itself. As for a backend all you need is a runtime implementation and to generate code that implements the opcodes. The runtime took me some work as I found it quite hard to express abstract ideas in assembly. But the Oak backend interface makes it trivial to interface the Oak code into the runtime.

I'm not sure about adapting to real 8086. It is quite a level up and it is missing a lot of features that a linker usually does. I'm sure someone can. The author of MAR is back on the project to get it up to speed again so i'm hoping to get a solid version working soon, preferably with a low entry point github pages setup to show people how easy it is. After that i'm hoping the POC can get people's attention and people get interested in writing their own backends.

So i got sick of not being able to debug the bf.ok file correctly since my backend was only 'debuggable' when the virtual machine actually worked correctly. I decided to have a major refactor with the changes you made to how core and std now integrated. I am implementing some basics of the C library in assembly without any dependencies on the virtual machine. This would allow me to easily debug the virtual machine with working normal assembly.

One issue i was having was that MAR has no module system nor imports and dependecies. So I got the idea to use the oak compiler itself to generate the std.mar code :D. I am confused that I could get it to work so easily. I can now write a small assembly module like prn.mar and an oak file prn.mar.ok. The oak file does an #[include] to any dependecy assembly files (they all have include guards) and then includes the assembly with a #[extern]. I don't know if you like it, but as long as MAR itself has no multifile support im going to keep it in. Tough if you have any tips how to refactor it into something modular and external I would be all ears.

I plan on generating the *.mar.ok files with some simple doc comments. Im starting to like Rust now. It complains a lot once you figure out what the compiler expects you can get a lot done with very few lines.

Take a look at the branch at https://github.com/kevinramharak/oakc/tree/mar/master

@adam-mcdaniel After some refactoring to make the code easier to debug I am happy to announce that the brainfuck interpreter works. Also on the note of x86 target I noticed this fork https://github.com/Not-Nik/oakc.

This is unbelievable!!!! Does the new MAR implementation work with the automatic memory management pull request?? I'm very interested to see a demo or video of the MAR version in action. Also, the x86 implementation @Not-Nik is working on looks AWESOME. I had no idea someone was working on another backend!!!!!!! Im excited :)

Ngl I kinda left this fork to just be some waypoint on how a possible x86 backend could look. With the addition load_base_ptr(), establish_stack_frame(), and end_stack_frame() into the IR I got somewhat lost on the implementation. A best case for me would be a clear memory map on how you imagine a backend to handle things. Also great language you made there ;)

The examples in examples/copy_drop print the same output as when running the executable generated by the C backend. Im working on implementing the C stdlib and writing the ffi functions so its easier to test other cases than 'just' printing out strings.

For example:

#[std]

// this seems to be a valid function name, really usefull for namespacing
fn C::strlen(string: &char) -> num {
    __ffi_pass_arg_by_reference!(string as &void);
    __c_strlen!();
    return __ffi_pass_return_value!() as num;
}

#[doc("Copies `length` amount of words from the location pointed to by `source` to the memory location pointed to by `destination`.")]
fn C::memcpy(destination: &void, source: &void, length: num) -> &void {
    // implement it in oak
    for (let i = 0; i < length; i += 1) {
        (destination as &num)[i] = (source as &num)[i];
    }
    return destination;
    // if we need performance/less code size we can implement it in assembly and make this function a wrapper for it instead
    // __ffi_pass_arg_by_value!(length);
    // __ffi_pass_arg_by_reference!(source);
    // __ffi_pass_arg_by_reference!(destination);
    // __c_memccpy!();
    // return __ffi_pass_return_value!() as &void;
}

fn main() {
    let string = "Hello World!";
    let length = C::strlen(string);
    putnumln(length);
}

The __ffi_* functions are small assembly routines to copy values, references and return values from the oak stack to the assembly stack thus providing the bridge between oak and native code without having to make the assembly function to be aware of how the oak vm works.

I'll try to record a demo soon enough. Its a bit hard to show it because im working on a non-released branch of Much Assembly Required that has a debugger (lifesaver @simon987) and interrupts implemented. Once those are released into the main branch you can just checkout my fork, compile a program with the --mar flag and paste the contents into the online MAR editor.