t-crest/patmos-simulator

Stack load/store bundled with stack cache management instruction

Opened this issue · 3 comments

Emoun commented

When bundling a stack load/store with a stack management instruction (e.g. sresi, sfreei) the management instruction is effectively executed before the load/store.

Example:

                .word   100;
                addi	r1 = r0, stack_base
                addi	r2 = r0, 123
                addi	r3 = r0, 0
                mts		s5 = r1
                mts		s6 = r1
                sres	2                                            #1
                sws		[0] = r2	                     #2
                sfree	2 || lws		r3 = [0]	     #3
                halt;
                nop;
                nop;
                nop;
stack_base:
                .word   0;
                .word   0;
                .word   0;
                .word   0;

Here, I would expect #3 to work, because the load and sfree should not affect each other. However, pasim issues the error Stack size exceeded: Reading beyond the current size of the stack cache. By looking at the code, I can see that it is because the pipeline is emulated first for the first slot, which changes the stack size to prohibit the second slot from issuing the load (the stack cache is not pipelined in the simulator nor does it account for the sequential nature of the simulator in contrast to what I would expect from hardware).
Likewise, moving the #2 store up to be bundled with #1 will work, because the sres will reserve the stack first, allowing the store to resolve after.

I would expect both of these to be wrong. I would expect the default behavior of bundling to be that the two instruction don't affect each other meaning the #1 || #2 should fail because the stack is not ready before the store and #1 to succeed because the stack was ready before the load.

Note: Stack load/store are the only load/stores that are allowed in the second slot by default in the simulator. I don't know if the hardware is the same.

@schoeberl Thoughts? What should be the correct behavior and what does the hardware do? The handbook does not cover this as far as I can tell.

Having a stack pointer manipulation instruction and using that pointer in the same bundle is a bit dangerous. Maybe you can look it up in the hardware to see what will happen first. Is this code you generate manually or compiler output?

Emoun commented

I hand wrote the above but hit the problem while working on dual-issue single-path. I'll look at the hardware code and see what I find.

This is not generally a problem for the compiler as it doesn't put stack load/store in second slot.

Emoun commented

Looking at the hardware code, I suspect a cache control instruction will affect a bundled stack load/store, regardless of which slot they're in (I think the simulator does the same, not sure).

A stack control instruction will issue a stack operation to the stack cache in the execute stage (see here).
A stack load/store will issue the address to use in the memory stage (see here)
Therefore, since the stack cache stores its pointers to memory itself, the control instruction will take effect after the execute stage while the address for the load/store will only be issued after the memory stage, making the addresses relative to the new cache configuration.

It is more intuitive for me that the cache control only takes effect after the whole bundle is done. The current situation is difficult for the compiler because we model the stack cache control instructions as "writing" to the cache, which is enough to handle the "intuitive" (for me) semantics in scheduling. If we say the current implementation needs to stay, these instructions now have a non-standard semantic function that needs its own way of being modeled in the compiler without the help of LLVM.

Note: The hardware does not support stack load/store in the second slot, as far as I can tell, so this problem cannot occur there currently.

My current implementation for single-path code simply disallows stack control and stack load/store in the same bundle, so this is not a critical issue. But if we eventually choose to enable the permissive dual-issue stuff, the semantics of such cases has to be established. (I vote for allowing it)

Since simulator currently allows something the hardware and handbook disallow and isn't used by the compiler (stack load/store in second slot), I will look into putting it under the ''--permissive-dual-issue" flag instead.