adam-mcdaniel/oakc

Because variables are stored statically, recursion is effectively impossible

Closed this issue ยท 10 comments

Hello. It appears variables are stored statically. This means that every call to a given function will operate on the same local variables. This is not good. It means a function cannot call itself and expect the callee to operate on different local variables than the caller.

This is described in the readme:

When a variable is defined, it's given a static location on the memory tape.

I have written a simple program that demonstrates this:

fn recursive(a: num) {
	let number: num = a + 1;
	if number < 5 {
		recursive(number);
	}
	putnum(number);
}

fn main() {
	recursive(0);
}

When compiled by oak and run, the output is 55555. In a language where local variables are scoped between function calls (that is, nearly every other language), the output would be 54321.

To fix this, local variables should be stored on the stack, and the top of the stack at the time the function was called should be stored in some way, and returned to when the function returns.

Most x86 compilers do something like this:

; rsp is the stack pointer
; rbp is the frame pointer, essentially used to store where the stack was when the function was called
recursive:
	push rbp
	mov rbp, rsp
	sub rsp, 16 ; make room for 2 64 bit variables
	; do something with those local variables
	mov rsp, rbp
	pop rbp
	ret

Some compilers omit the frame pointer entirely and rely on compile-time knowledge on how much stack space was used during a function call (-fomit-frame-pointer in GCC for example).

Example of GCC assembly for equivalent function

Functions should store local variables on the stack, or, if that is not desired, it should be clearly documented that every local variable is effectively static.

Note that the leave instruction that GCC generates is equivalent to the following:

	mov rsp, rbp
	pop rbp

I could be wrong, but this doesnt seem to effect tail call recursion at all. I'd be willing to accept tail call recursion and make recursion before the end of a function a compiler error.

You are right that this does not affect tail calls. Throwing a compile error could be an option, but it would require keeping track of every function call that every function makes. For example, if function a calls function b, and b calls c, and somewhere down the line some function calls a, that would cause as local variables in the first call to be over written.

Additionally, this will make multithreading nearly impossible, unless the number of threads is known at compile time (a pretty big limitation itself) and each thread has its own static local variables.

You're right about this. This can be fixed with a pretty significant amount of work, although most of the work is just verifying that the new implentation using the stack is algorithmically sound. Thankfully, though, this problem is entirely restricted to asm.rs.

I think that, for each AsmExpression and AsmStatement method, if we added a local_scope hashmap parameter, and changed var_size to global_scope, we would be able to use the stack to operate on variables instead of static positions.

This would require an additional instruction for the virtual machine to retrieve the stack pointer, but I don't mind that at all.

I'm excited to work on this issue; I didn't realize it was that big of a deal. Thank you very much for pointing it out.

I have a branch with a seemingly working patch. I've still got to confirm everything REALLY is as good as it seems, but there should be a PR sometime tonight. Also, expect a PR for explicitly static variables as well.

Every example I've tested so far, including bf.ok (the most complicated), works flawlessly, which is a great sign.

Great to hear! Glad this bug is being fixed.

Me too! I'm very pleased with how straightforward everything is to implement. Adding stackframes was relatively easy due to the minimalistic structure of asm.rs. I'm very satisfied with how well the architecture of the compiler adapts to changes.

This is addressed by PR #34. I tried to be as detailed as possible in the documentation for the three new virtual machine operations because they are a bit complicated. For specific information on the implementation for each of these new virtual machine instructions, look at the c implementation's machine_establish_stack_frame, machine_end_stack_frame, and machine_load_base_ptr functions. Additionally, checkout the new readme in the PR for a simplified explanation of the functions' implementations.

The PR demonstrates in great detail how the stack frames function now.